Reputation: 33
I currently got a performance issue with an application that uses many sql select.
The programming language is java and I'm using a mysql database. It contains about 10 million records. What it needs to do is to find records in a database with zipcode and house number as parameters. When it does not find a record, it needs to do a query with only the zipcode and get the record with the lowest house number. When the zipcode cannot be found in the database the application needs to deal with this.
Thus the code for doing single queries looks like this:
Statement select = "select * from zipcode_addresses where zipcode = ? and houseNo =?";
ResultSet rs = select.executeQuery();
if(rs.next()) {
dealWithResult(rs);
}
else {
Statement alternativeSelect = "select * from zipcode_addresses where zipcode = ? group by houseNo having min(houseNo)";
ResultSet rs = alternativeSelect.executeQuery();
if(rs.next()) {
dealWithResult(rs);
} else {
System.err.println("Could not find zipcode :" + zipcode);
}
}
Is there a proper way of doing batch select queries which deals with data not being found?
Thanks!
Update
The table structure is the following:
+-----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| zipcode | varchar(6) | NO | PRI | NULL | |
| house_no | int(11) | NO | PRI | NULL | |
| sanddcode | varchar(45) | NO | | NULL | |
| depot | varchar(3) | NO | | NULL | |
| network_point | varchar(6) | NO | | NULL | |
| region | varchar(3) | NO | | NULL | |
| seq | int(11) | NO | | NULL | |
| cluster_id | varchar(1) | NO | | NULL | |
| strand_id | int(11) | NO | | NULL | |
| strand_props_id | int(11) | NO | | NULL | |
| version_id | int(11) | NO | PRI | NULL | |
+-----------------+-------------+------+-----+---------+-------+
Primary key on version id, zipcode and house_no Index on zipcode and house_no and another index on zipcode, both using BTREE index.
The application might sometimes be used to do 1 million distinctive select queries at which point it just takes too long.
Upvotes: 0
Views: 226
Reputation: 9711
The 'group by' in your second SQL query is unnecessary and killing performance. For maximum performance, replace this select (the second one in your code) ...
select * from zipcode_addresses where zipcode = ?
group by houseNo having min(houseNo)
with this ...
select min(houseNo) from zipcode_addresses where zipcode = ?
Also, ensure you have an index on zipcode + houseNo (which it looks like you have - from the updated post).
Upvotes: 0
Reputation: 7267
Your code snippet doesn't show how your statements are being prepared. If your statements are being called numerous times then you should take a look at the PreparedStatement object:
http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html
Your statements can then be cached to reduce future overhead.
Upvotes: 1
Reputation: 1382
A lot depends on the usage pattern. How many queries you run, how often there is a ZIP code miss, etc. First off, I would use PreparedStatements where posssible. I am not that familiar with MySQL, but they are usually cached and reused by the connection-database, that will help with performance. Next, If ZIP code misses were common, I would probably build an in memory cache of ZIP codes to short circuit doing 3 queries on a miss. After that, I might make a view that's ZIP + house number. Going further depends more on how your appliation works, but these things would help.
Upvotes: 0
Reputation: 968
you could create a stored procedure with two parameters, and use the House Number as optional, or just let that the procedure finds if exists or don't.
Upvotes: 0