Reputation: 4944
I am creating an application in Java that uses SQLite to store and search the data in a database.
I am not sure if I am approaching this problem in the most efficient way and I figured someone here could help me out with that.
Background info: My Java application parses .PDF files using a library that can transform the raw text from the PDF files to a StringWriter. I then parse the resulting data and get the info I need to create some new rows in my database. The resulting tables are very large, though, as there are about 900 PDF files to parse. Just to give you an idea of how large I'm talking, one of the table ends up with about 145000 rows, another with 1550 rows, and the others (3 or 4 other tables) with between 75 and 750 rows.
Everything works fine, but I'm not sure if Icould lower the required time to create the tables and stuff. So far, on my laptop computer, it takes 41 minutes to create everything the first time through (though everything runs from an USB flash drive... I'll test it on a HDD later). It takes 1.5 minutes when I run it again since it checks if the file has already been parsed and it doesn't re-create everything. I don't need it to be a HUGE improvement since ideally I'd run this program only once a week with about 30 files or so, but still, I'm wondering why it is so slow with 900 files; if it's the code that is parsing the files that is slow or if it's bad practice on my end in the SQLite part. (I'm testing it with all the files created in the last year, which is why I have that many)
So, what are the best practices to improve performance with SQLite in Java? Would it make a noticeable difference to put autocommit to false and to commit only once everything is created? Is there a way to create statements or to test if data already exists in a more efficient manner?
I don't have my code with me, but the queries look kinda like this:
public static void insertScores(String league, int playerID, int score, String date)
{
PreparedStatement ps = new PreparedStatement("INSERT INTO Scores(?,?,?,?)");
ps.setString(1, league);
[...]
ps.executeUpdate();
}
On other queries, I test to see if the row already exists using something like this:
public static void insertScores(int playerID)
{
ResultSet rs = null;
PreparedStatement ps = new PreparedStatement("SELECT * FROM Scores WHERE ID = ?");
ps.setInt(1, playerID);
rs = ps.executeQuery();
if(!rs.next())
{
[code like in the first example]
}
}
Keep in mind that syntax errors are because I'm only typing this by heart since I don't have my code with me.
Just by seeing those examples and reading what I had to say, does anyone have any idea how to improve performance in my SQL statements?
Upvotes: 0
Views: 943
Reputation: 40884
USB flash drives have terrible performance when you make lots of small updates. Flash needs to read an entire block into a buffer, update its relevant part, erase the block and then write back the buffer. (SSD have logic to alleviate this a bit.)
Move your data to HDD and see if it helps.
Upvotes: 0
Reputation: 6289
How many records in Scores would have the same playerID? If it's plenty, try to determine the presence of the specific playerID as such:
select 1 where exists(select 1 from scores where id = ?)
or similar. I'm not familiar with the SQL dialect used in SQLite, but this approach usually helps to shortcircuit further computations when the first record with the specified playerID is found.
Upvotes: 1
Reputation: 9938
Two suggestions:
1) Get a profiler. You can guess at what makes your code slow, or you can just profile it and know what makes it slow.
2) Since your data is on a slow device, you want to read/write as little as possible. SELECT *
brings back the entire row, but then you just check for existence. Try SELECT ID
, which will only need to read a single number.
Upvotes: 2