Reputation: 808
I am running nginx as part of the docker-compose template.
In nginx config I am referring to other services by their docker hostnames (e.g. backend
, ui
).
That works fine until I do that trick:
docker stop backend
docker stop ui
docker start ui
docker start backend
which makes backend and ui containers to exchange IP addresses (docker provides private network IPs on a basis of giving the next IP available in CIDR to each new requester). This 4 commands executed imitate some rare cases when both upstream containers got restarted at the same time but the nginx container did not. Also, I believe, this should be a very common situation when running pods on Kubernetes-based clusters.
Now nginx resolves backend
host to ui's IP and ui
to backend's IP.
Reloading nginx' configuration does help (nginx -s reload
).
Also, if I do nslookup from within the nginx container - the IPs are always resolved correctly.
So this isolates the problem to be a pure nginx issue around the DNS caching.
The things I tried:
resolver 127.0.0.11 ipv6=off valid=10s;
server {
<...>
set $mybackend "backend:3000";
location /backend/ {
proxy_pass http://$mybackend;
}
}
map
:http {
map "" $mybackend {
default backend:3000;
}
server {
...
}
}
resolver local=true
None of the solutions gave any effect at all. The DNS caches are only wiped if I reload nginx configuration inside of the container OR restart the container manually.
My current workaround is to use static docker network declared in docker-compose.yml. But this has its cons too.
Nginx version used: 1.20.0 (latest as of now) Openresty versions used: 1.13.6.1 and 1.19.3.1 (latest as of now)
Would appreciate any thoughts
UPDATE 2021-09-08: Few months later I am back to solving this same issue and still no luck. Really looks like the bug in nginx - I can not make nginx to re-resolve the dns names. There seems to be no timeout to nginx' dns cache and none of the options listed above to introduce timeouts or trigger dns flush work.
UPDATE 2022-01-11: I think the problem is really in the nginx. I tested my config in many ways a couple months ago and it looks like something else in my nginx.conf prevents the valid
parameter of the resolver
directive from working properly. It is either the limit_req_zone
or the proxy_cache_path
directives used for request rate limiting and caching respectively. These just don't play nicely with the valid
param for some reason. And I could not find any information about this anywhere in nginx docs.
I will get back to this later to confirm my hypothesis.
Upvotes: 19
Views: 11859
Reputation: 11
I have the same experience and did not find a solution. what you can do though to "automate" the remedy is monitor the ip change and do a HUP on nginx then something like this (works on docker swarm)
#!/bin/bash
# Docker network and container names
DOCKER_NETWORK="my_network"
CONTAINER_NAME="my_container"
# File to store the last known IP address
LAST_IP_FILE="/tmp/last_ip.txt"
# Function to get the current IP address of the container
get_container_ip() {
docker network inspect "$DOCKER_NETWORK" | jq -r ".[].Containers[] | select(.Name | contains(\"$CONTAINER_NAME\")) | .IPv4Address" | cut -f1 -d'/'
}
# Main monitoring loop
while true; do
# Get the current IP address of the container
CURRENT_IP=$(get_container_ip)
# Proceed only if we got a valid IP address
if [ -n "$CURRENT_IP" ]; then
# Read the last known IP address
if [ -f "$LAST_IP_FILE" ]; then
LAST_IP=$(cat "$LAST_IP_FILE")
else
LAST_IP=""
fi
# Compare the current IP with the last known IP
if [ "$CURRENT_IP" != "$LAST_IP" ]; then
# Log the change or take any action you need
echo "IP change detected for $CONTAINER_NAME: $LAST_IP -> $CURRENT_IP"
docker kill -s HUP $(docker ps -q --filter name=my_nginx)
# Update the last known IP address
echo "$CURRENT_IP" > "$LAST_IP_FILE"
fi
else
echo "No valid IP address found for $CONTAINER_NAME."
fi
# Wait for a specified interval before checking again
sleep 60
done
Upvotes: 0
Reputation: 1
After a long search I found some solution for uwsgi_pass. The same should work for proxy_pass
.
resolver 127.0.0.11 valid=10s;
set $upstream_endpoint ${UWSGI_ADDR};
location / {
uwsgi_pass $upstream_endpoint;
include uwsgi_params;
}
where UWSGI_ADDR
is the name of your application container with port, e.g. app:8000
.
UPD:
In fact, it follows from proxy_pass documentaiton.
Parameter value can contain variables. In this case, if an address is specified as a domain name, the name is searched among the described server groups, and, if not found, is determined using a resolver.
Also you can find some useful information in section "Setting the Domain Name in a Variable" in the blog authored by one of the nginx developers.
Upvotes: 0
Reputation: 8452
Was struggling on the same thing exactly for the same thing (Docker Swarm) and actually to make it work I required to let the upstream
away from my configuration.
Something that works well (tested 5' ago on NGINX 2.22) :
location ~* /api/parameters/(.*)$ {
resolver 127.0.0.11 ipv6=off valid = 1s;
set $bck_parameters parameters:8000;
proxy_pass http://$bck_parameters/api/$1$is_args$args;
}
where $bck_parameters
is NOT an upstream but the real server behind.
Doing same thing with upstream will fail.
Upvotes: 3
Reputation: 406
TLDR: Your Internet Provider may be caching dnses with no respect to tiny TTL values (like 1 second).
I've been trying to retest locally the same thing.
But nslookup is your friend, you can query each dns server between nginx and root DNS server.
Something very easy to reproduce (without setting up local dns server)
Create route 53 'A' entry with TTL of 1 second and try to query AWS dns server in your hosted zone (it will be sth. like ns-239.awsdns-29.com) Play around with dig / nslookup command
nslookup
set type=a
server ns-239.awsdns-29.com
your.domain.com
It will return IP you have set
Change the Route53 'A' entry to some other ip.
use dig / nslookup and make sure you see changes immediately
Then set resolver in nginx to AWS dns name (for testing purposes only). If that works it means that DNS is cached elsewere and this is no longer nginx issue!
In my case it was sunrise WIFI router which began to see new IP only after I restarted it (I assume things would resolve after some longer value).
Great help when debugging this is when your nginx is compiled with
--with-debug
Then in nginx logs you see whether given dns was resolved and to what IP.
My whole config looks like this (here with standard docker resolver which has to be set if you are using variables in proxy_pass!)
server {
listen 0.0.0.0:8888;
server_name nginx.my.custom.domain.in.aws;
resolver 127.0.0.11 valid=1s;
location / {
proxy_ssl_server_name on;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header Host $host;
set $backend_servers my.custom.domain.in.aws;
proxy_pass https://$backend_servers$request_uri;
}
}
Then you can try to test it with
curl -L http://nginx.my.custom.domain.in.aws --resolve nginx.my.custom.domain.in.aws 0.0.0.0:8888
Upvotes: 3
Reputation: 99
Maybe it's because nginx's DNS resolver for upstream servers only works in the commercial version, nginx plus?
https://www.nginx.com/products/nginx/load-balancing/#service-discovery
Upvotes: 4