Reputation: 1749
Introduction
I've found plenty of information about the Too many open files exception on the Web but I couldn't solve this strange case. As I've read, the exception is thrown when the number of opened file descriptors by process defined in the OS is exceeded. The nature of these files is diverse. Files could be sockets, documents, etc. And I've found robust and secure ways to open files that I have implemented in my Java application.
The application is a short program that downloads Web pages using the Boilerpipe algorithm. This way I get the most representative contents of that site. Then, I write it in an appropriate format (TREC format) to disk. The URLs of these websites are taken from a MySQL database that I access using the JDBC connector.
So, I think that the exception can be thrown form three different places:
Although, as I said, I think that I use a correct way of opening and writing those files.
Problem
There are thousands of URL's to process and the exception is thrown after a while (what makes it also very difficult to debug...). I don't know if that matters, but URLs are classified into different categories and I run different instances of the program to speed up the whole process. Categories don't overlap so there shouldn't be any problem.
Code
To make it more readable I'm going to show just those three parts of my code simplified:
Database access
// Connect to database
Connection dbconn = null;
try {
String dbUrl = "jdbc:mysql://" + dbServer + "/" + dbName;
Class.forName ("com.mysql.jdbc.Driver").newInstance ();
dbconn = DriverManager.getConnection(dbUrl, dbUser, dbPass);
System.out.println ("Database connection established");
} catch (Exception e) {
e.printStackTrace();
System.err.println ("Cannot connect to database server");
System.exit(-1);
}
System.out.println(" Downloading category: " + category);
Statement s = null;
try {
s = dbconn.createStatement();
} catch (SQLException e) {
System.err.println ("Error on creating the statement");
System.exit(-1);
e.printStackTrace();
}
String q = "SELECT resource,topic FROM " +
"content_links " +
"WHERE topic LIKE 'Top/" + category + "%';";
try {
s.executeQuery(q);
} catch(Exception e) {
System.err.println ("Error on executing the SQL statement");
System.exit(-1);
e.printStackTrace();
}
ResultSet rs = null;
try {
rs = s.getResultSet ();
} catch (SQLException e) {
System.err.println ("Error on getting the result set");
System.exit(-1);
e.printStackTrace();
}
int count = 0, webError = 0;
// work with the result set
try {
while (rs.next ()) {
// MAIN LOOP
}
} catch (SQLException e) {
System.err.println ("Error on getting next item");
System.exit(-1);
e.printStackTrace();
}
// Close connection to database
if (dbconn != null) {
try {
dbconn.close ();
System.out.println (" Database connection terminated");
} catch (Exception e) { /* ignore close errors */ }
}
HTTP connection, extract site's title and boilerpipe filter
try {
String title = "";
org.jsoup.nodes.Document doc = Jsoup.connect(urlVal).get();
for (Element element : doc.select("*")) {
if (element.tagName().equalsIgnoreCase("title")) {
title = element.text();
}
if (!element.hasText() && element.isBlock()) {
element.remove();
}
}
String contents = "";
contents = NumWordsRulesExtractor.INSTANCE.getText(doc.text());
storeFile(id, urlVal, catVal, title, contents);
}
} catch (BoilerpipeProcessingException e) {
System.err.println("Connection failed to: " + urlVal);
} catch (MalformedURLException e1) {
System.err.println("Malformed URL: " + urlVal);
} catch(Exception e2) {
System.err.println("Exception: " + e2.getMessage());
e2.getStackTrace();
}
Writing file
private static void storeFile(String id, String url, String cat, String title, String contents) {
BufferedWriter out = null;
try {
out = new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream(
new File(path + "/" + id + ".webtrec")),"UTF8"));
// write in TREC format
out.write("...");
} catch (IOException e) {
System.err.println("Error: " + e.getMessage());
e.printStackTrace();
} finally {
try {
out.close();
} catch (IOException e) {
System.err.println("Error: " + e.getMessage());
e.printStackTrace();
}
}
Upvotes: 1
Views: 8752
Reputation: 718758
Yup. You are leaking file descriptors.
In the first case you open a DB connection and never close it. The connection will typically use a socket or something to talk to the database. Since you don't close the connection, the socket won't be closed, and you will leak a file descriptor.
In the second case, I suspect that the call to Jsoup.connect(urlVal)
is opening a connection, which you don't then close. That will result in a file descriptor leak.
Correction - there is no close()
method on the Connection
interface. It looks like the actual connection must be created and then closed internally by the get
method. Assuming that is so, there is no file descriptor leak in the second case.
The third case does not leak file descriptors. However, if you fail to open the file, out.close();
statement will attempt to call a method on null
... and will throw a NPE.
The solution is to find all of the places where you open files, database connection, http connections, and make sure that the handle is always closed.
One way to do it is to put the close()
call (or equivalent) in a finally
block ... but make sure that you don't accidentally call close()
on null
.
The other way to do it is to use the Java 7 "try with resource" syntax. For example:
private static void storeFile(String id, String url, String cat,
String title, String contents) {
try (BufferedWriter out = new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream(
new File(path + "/" + id + ".webtrec")),"UTF8"))) {
// write in TREC format
out.write("...");
out.close();
} catch (IOException e) {
System.err.println("Error: " + e.getMessage());
e.printStackTrace();
}
}
(Note however that the Java 7 syntax can only be used with resources that implement the new Closeable
interface.)
Upvotes: 3
Reputation: 347194
To add to Stephen's comprehensive analysis. I recommend using a connection pool for the database, although, as Stephen has pointed, unless you're closing these connections, you'll drain the pool close, but at least it will be easier to discover why...
I've not seen any evidence, but you should be using some kind of Thread pool to download the pages, this will help to maximize the resources of the system. Some of executor service would suffice. Like I say, you're probably already doing this, but you didn't show any code (or comment) for it.
Upvotes: 2