Silverknight
Silverknight

Reputation: 1

how to append a sql query to existing file in java?

I am currently working on a java code that allows me to query a database and extract its content to a file.

So far no problem for small requests.

But I will quickly have to extract large volumes of data and I have been trying for a few days to implement the most efficient solution in order to limit memory consumption as much as possible.

Because as soon as I make an important request, the memory of the source machine and the target machine is saturated.

The java version I use on the redhat linux environment is java-1.8.0

So far, I have been able to redirect the result of my query to a file. But after a lot of documentation, I could see that there were many different methods to limit memory consumption.

DriverManager.registerDriver(new              
com.wily.introscope.jdbc.IntroscopeDriver());
Connection conn = DriverManager.getConnection("jdbc:introscope:net//" +     
user + ":" + password + "@" + hostname + ":" + port);       

String query = "select * from metric_data"
                + " where agent='"
                + agents_filter
                + "' and metric='"
                + metrics_filter
                + "' and timestamp between "
                + queryInterval;

Statement ps=conn.createStatement();
ResultSet rs=ps.executeQuery(query);

rs.setFetchSize(Size);
ResultSetMetaData rsm = rs.getMetaData();
File output = new File("result");
PrintWriter out = new PrintWriter(new BufferedWriter(
    new OutputStreamWriter(
    new FileOutputStream(output), "UTF-8")), false);

    for(int i = 1; i <= rs.getMetaData().getColumnCount(); i++){
    String colName = rs.getMetaData().getColumnName(i);
    out.print(" " + colName + "\t\t" + "|");
        }

        while (rs.next()) {
            for(int i = 1; i <= rs.getMetaData().getColumnCount(); i++){
                String colValue = rs.getString(i);
                out.print(" " + colValue + "\t" + "|");
                                                                        }

                out.println();
                        }


    out.close();
    out.flush();
    rs.close();
    ps.close();
    conn.close();

Currently the request is fully loaded into memory and then redirected to my file. But as soon as the request is too important I get the following messages:

Exception in thread "PO:client_main Mailman 2" java.lang.OutOfMemoryError: Java heap space Exception in thread "UnknownHub Hub Receive 1" java.lang.lang.OutOfMemoryError: Java heap space

I would like to be able to write for example 1000 lines by 1000 lines in the file so as not to saturate the memory.

Knowing that files can sometimes reach 40gb

The execution time is not really a problem, but the memory consumption is a really important criterion.

I am far from being a java professional, so that's why I would need a little help from you.

Thank you in advance for your time

Upvotes: 0

Views: 1876

Answers (1)

rzwitserloot
rzwitserloot

Reputation: 103073

  1. constructing your SQL string by concatenating strings is a security leak. Imagine those variables hold something like: "1'; DROP ALL TABLES; --". Even if here you know the strings are 'safe', code changes, and you should not adopt bad habits. Fix this; you can use PreparedStatement to fix it.

  2. metadata isn't free. Cache that stuff. Specifically, cache the value rs.getMetaData().getColumnCount().

  3. For real speed here, run an SQL command that tells the DB engine to directly pump that data to a file, and then transfer this file if it's not on local host. Can't really go any faster than that.

  4. you can't flush after close, and close implies flush. You can just remove the flush() line.

  5. Assuming your fetch size isn't ludicrously large, there's nothing in this code that would indicate an out of memory error would occur. So, it's either the repeated invocations of getMetaData (which means caching the column size would fix your problem here), or the DB engine and/or its JDBC driver is badly written. I haven't heard of introscope which is why I mention it. If that is the case, at best you can use SQL OFFSET and LIMIT to separate your query into 'pages' and thus not grab too many results at once, but without an ORDER in your SQL, technically the DB engine is allowed to change the order on you, and with it, the process might become quite slow.

Upvotes: 1

Related Questions