Reputation: 4113
I've got a website running on Amazon Web Services that is deployed using Elastic Beanstalk and runs on a single EC2 micro instance. It is a staging environment and I'm the only person having access to it. Using Apache JMeter, I simulate six users navigating on the website, averaging about a request every 3 seconds in total (images, CSS, JS and other static resources are served by CloudFront and don't make traffic on the EC2 instance).
The problem is that after a while (usually 30-60 minutes from when the environment is set up), the website stops responding. I'm sure that Tomcat is still running properly, since I can see in the log (catalina.out) that cronjobs are still being executed. It seems to be only ELB not able to serve the response.
Analysing the logs, there are no errors at all on Tomcat (none in /opt/tomcat7/logs/tail_catalina.log or /opt/tomcat7/logs/catalina.out). The following errors start appearing on /etc/httpd/logs/elasticbeanstalk-error_log as soon as the website becomes unreachable:
[Thu Jun 14 20:26:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:26:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:26:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:26:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:27:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:27:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:27:43 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:27:43 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:27:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:27:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:28:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:28:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:28:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:28:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:28:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:28:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:29:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:29:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:29:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:29:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:29:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:29:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:30:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:30:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:30:43 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:30:43 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:30:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:30:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:31:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:31:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:31:43 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:31:43 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:31:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:31:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:32:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:32:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
... until the EC2 instance gets finally terminated (and a new one is automatically started).
This problem doesn't happen if I don't make any requests (or if I make fewer).
Any help greatly appreciated.
Thanks!
Upvotes: 2
Views: 6126
Reputation: 9853
I've just spent a day battling a similar problem to this one. I have a WAR file deployed to an Amazon Elastic Beanstalk environment. The difference with me was that the instance spun up by the AEBS environment only lasted 5 minutes before it was terminated and replaced by a new instance by the AEBS.
After rather a lot of digging (in 5 minute chunks while my instance was still alive) and some light reading I found that AEBS Tomcat instances are created with Apache receiving requests on port 80. Requests to on /_hostmanager
are re-routed to port 8999 and anything else to port 8080 (Tomcat). A Ruby application called 'hostmanager' deployed to the instance listens on port 8999. This application presumably reports back to the AWS Elastic Beanstalk Host Manager with traffic & other statistics to allow the Elastic Beanstalk environment to get a picture of the load on the environment and scale up or down the number of instances appropriately.
If the AWS Elastic Beanstalk Host Manager gets no response from an instance's hostmanager application then it will terminate the instance and fire up a new one. This may be why your site lasts 30 minutes and then dies.
So I guess the problem here lies not with your Java application being served up on port 8080 but with the hostmanager application not listening on port 8999. This is probably what is causing:
[Thu Jun 14 20:26:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:26:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
Check out /opt/elasticbeanstalk/var/log/hostmanager.log
as it might give you more clues as to what is going on and why the hostmanager application is unhappy.
In my case it turned out that my hostmanager app was running a wget
to an Amazon S3 Storage bucket and was getting a 404 response (I found this out from looking at the hostmanager.log mentioned above). This was causing the hostmanager to fail to start up. Hence when an incoming request got re-routed to port 8999 nothing was listening. Fail. Instance terminated.
Rather than trying to work out exactly why the hostmanager application was failing I decided to treat the AMI being used by the Elastic Beanstalk environment as a lost cause. I ended up abandoning it and following the following steps to get a new Elastic Beanstalk environment running off a custom AMI:
Without knowing exactly what your set up is it is a little hard to help out precisely. Though hopefully a combination of knowing that the hostmanager listens on port 8999, the location of the hostmanager.log and some luck will get you where you want to be!
Upvotes: 1
Reputation: 3588
Let me start with an assumption:
If that's true, the log events:
[Thu Jun 14 20:26:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:26:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
..suggest that the application listener died. You can confirm this with a:
curl -v http://127.0.0.1:8999/
That curl
command should return a valid HTTP response when the site is operating normally, and will probably return a Connection refused
or couldn't connect to host
when you're experiencing the outage. You can also use the following command to check for a valid listener on the application port:
netstat -an | grep LISTEN | grep 8999
There are a number of reasons why the application listener could die, including but not limited to:
ps
to see if the JVM process is still running)lsof | wc -l
and compare to ulimit -n
of the application user)However, most errors should result in an error message being written to the JVM process's stderr
, which is normally logged. That's the best place to look. If all else fails, you may want to try running your Tomcat application in the foreground with debug logging enabled.
Upvotes: 7