I have ruled out several things and I am going crazy with this, can't solve it.
I use a larger VPS with LXD/LXC containers on it, packets get routed to the containers from eth0 with universally routable ipv6 addresses. To Cloudflare my containers are connected with ipv6 (AAAA) only and Cloudflare provides ipv4 compatibility. Cloudflare IPs are whitelisted in UFW on host and container.
When the 522 outage happens it is when I am very active on a site or on the webmin panels I have installed in each of my containers. When Cloudflare gets blocked it seems to be only my ip through the rate limiting of nginx (I am not sure why it also happens with webmin), as I can access the site through a vpn connection meaning from another ip or through a GTMetrix retest and I can access webmin directly with the ipv6 address and the port. Also uptime testing doesn't report downtimes. In the webmin panel during an outage I see barely any load on the container or host system.
So far so good, it has to be some kind of filtering happening.
I already tried increasing:
# Limit Request
limit_req_status 403;
limit_req_zone $binary_remote_addr zone=one:10m rate=150r/s;
limit_req_zone $binary_remote_addr zone=two:10m rate=550r/s;
to redicoulus amounts to rule that out in one container for one site. But it happened again working on the site in that container.
Could it be that keepalive with 500 connections is still too low?
Should we use another nginx where we can use "allow cloudflare-ipv6".
What do I do?
Please, anybody who can help will be much appreciated!
Whoever helps me actually figuring this out will get a few cups of coffee worth from me through paypal or alternatively 200g of the best black tea in the world, I'll send it anywhere you want.
Thanks in advance!