Reputation: 20210
HAProxy is pinging tomcat and requesting a very small page causing Tomcat to create a new session every 2 seconds. Is there is a way to programmatically (or through a config) tell Tomcat to not create a new session for a specific page?
Upvotes: 3
Views: 10156
Reputation: 33
Just add session=false
directive in the JSP.
<%@ page session="false"%>
Upvotes: 2
Reputation: 1218
Don't point it at your application pages. If httpchk is pulling a JSP page, it will create a session.
Can you point HAProxy at a static HTML page for your httpchk?
Aiternatively, create a servlet that checks whatever health you need to validate but doesn't create a session. (à la HttpServletRequest.getSession(false)
)
Upvotes: 3
Reputation: 652
You don't need to implement anything, it's already there ;)!
Tomcat container provides Crawler Session Manager Valve (valve is just like HttpServletFilter, but inside Tomcat container (lower level). You can find more details here http://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve
You just add <Valve> tag to your tomcat's server.xml with proper configuration. Remember to provide regular expressions for bot user agents.
For example
<Valve className="org.apache.catalina.valves.CrawlerSessionManagerValve"
crawlerUserAgents=".*googlebot.\*|.*yahoo.*" sessionInactiveInterval="600"/>
You can look at the source code of valve: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.tomcat/tomcat-catalina/7.0.11/org/apache/catalina/valves/CrawlerSessionManagerValve.java
Upvotes: 13
Reputation: 5666
Yes, there is. It's a bit complicated, but works well for us.
Basically, we change the Filter chain for sessions. We do this for bots (Google, Pear, Yahoo).
Create a new Filter and register it, then use this source for the Filter class:
public class BotFilter implements javax.servlet.Filter {
private int inactive_seconds = 5*60;
private String[] bots = new String[] { "googlebot", //google
"msnbot", //msn
"slurp", //yahoo
"libcurl", //curl, sometimes used with bigbrother
"bigbrother", //bigbrother availability check
"whatsup", //whatsup availability check
"surveybot", //unknown
"wget", // nocomment
"speedyspider", //http://www.entireweb.com/about/search_tech/speedyspider/
"nagios-plugins", //Alle Nagios-Abfragen
"pear.php.net", //Irgendwelcher PHP-Scheiß
"mj12bot", //http://www.majestic12.co.uk/projects/dsearch/mj12bot.php
"bingbot", //M$ Bing
"dotbot", //We are just a few Seattle based guys trying to figure out how to make internet data as open as possible.
"aggregator:spinn3r", //http://spinn3r.com/robot
"baiduspider" //http://www.baidu.com/search/spider.htm
};
private HashMap<String, HttpSession> botsessions;
public BotFilter() {
this.botsessions = new HashMap<String, HttpSession>();
}
public void init(FilterConfig config) throws ServletException {
}
public void doFilter(ServletRequest request, ServletResponse response, FilterChain next) throws IOException, ServletException {
if (request instanceof HttpServletRequest) {
HttpServletRequest httprequest = (HttpServletRequest) request;
try {
String useragent = ((HttpServletRequest) request).getHeader("User-Agent");
if (useragent == null) {
((HttpServletResponse) response).sendRedirect("http://www.google.com");
}
useragent = useragent.toLowerCase();
if (httprequest.getSession(false) == null) {
}
for (int i = 0; i < this.bots.length; i++) {
if (useragent.indexOf(this.bots[i]) > -1) {
String key = httprequest.getRemoteAddr() + useragent;
boolean SessionIsInvalid=false;
synchronized(this.botsessions) {
try {
if(this.botsessions.get(key)!=null)
this.botsessions.get(key).getAttributeNames();
} catch (java.lang.IllegalStateException ise) {
SessionIsInvalid = true;
}
if(this.botsessions.get(key)==null||SessionIsInvalid) {
httprequest.getSession().setMaxInactiveInterval(this.inactive_seconds);
if(SessionIsInvalid)
this.botsessions.remove(key); //Remove first, if in there
this.botsessions.put(key, httprequest.getSession()); //Then add a little spice
} else {
next.doFilter(new BotFucker(httprequest, this.botsessions.get(key)), response);
return;
}
}
};
}
} catch (Exception e) {
//Error handling code
}
}
next.doFilter(request, response);
}
public void destroy() {
}
}
And this little one for the redirection class:
public class BotFucker extends HttpServletRequestWrapper {
HttpSession session;
public BotFucker(HttpServletRequest request, HttpSession session) {
super(request);
this.session = session;
}
@Override
public HttpSession getSession(boolean create) {
return this.session;
}
@Override
public HttpSession getSession() {
return this.session;
}
}
These two classes re-use the sessions that the bots had before, if they connect again using the same IP within a given time limit. We're not 100% sure what this does to the data that the bot receives, but as this code is running for many months now and solved our problem (multiple connects/sessions per second per IP from Google).
And before somebody tries to help: The problem has been submitted multiple times to Google via Webmaster interface. The crawling interval has been lowered to the lowest possible setting, and the problem spawned a 3x reply thread on the appropriate forum without any hint as to why this problem exists.
Upvotes: 4
Reputation: 2352
Is this page part of a regular web app or a standalone web app?
If the page is part of a standalone web app i.e. only that page exists in that app, you could set a low session timeout in the web.xml for that app. Liek 2 minutes or even lower.
Otherwise, there is no solution to your problem on the Tomcat end. Tomcat will create a new session if an earlier session id is not sent along with the request.
Upvotes: 1