Reputation: 13
I have the following table structure
term_id int(11)
concept_id int(11)
idf double
And I have arround 2.4 milions records. When I perform the following query
stmt = con.prepareStatement("SELECT term, idf.idf FROM idf "
+ "JOIN term ON term.id=idf.term_id "
+ "WHERE concept_id= ?");
stmt.setInt(1, id);
res = stmt.executeQuery();
It takes about 3 seconds for a query. I am using MySQL Connector and I am performing this query from java code/eclipse.
Could I improve this speed ?
If a try performing a query directly in phpMyAdmin I have the following time results:
Records 18,869 total Time : 0.4204 sec
SELECT term, idf.idf
FROM idf
JOIN term ON term.id = idf.term_id
WHERE concept_id = 4
Upvotes: 1
Views: 286
Reputation: 76753
The problem is in this line:
Records 18,869 total Time : 0.4204 sec
You are trying to push 18,869 records from the remote host to your local client.
Of course you don't need 18k records, who was the time to read all that?
So you probably only need the top few rows.
Or you need to do further processing.
Get top 100 rows
SELECT term.*, idf.idf
FROM idf
JOIN term ON term.id = idf.term_id
WHERE concept_id = 4
ORDER BY term.id
LIMIT 100;
Store the data in a memory table for further processing
INSERT INTO mem_idfterm
SELECT term.*, idf.idf
FROM idf
JOIN term ON term.id = idf.term_id
WHERE concept_id = 4
ORDER BY term.id;
And than do a further select from mem_idfterm,
or use the (now infamous) select as a subselect to extract more data from.
Try to never push more than 1000 rows across the network.
Finally you can get the data in batches:
SELECT term.*, idf.idf
...
LIMIT 0, 100; -- first batch
SELECT term.*, idf.idf
...
LIMIT 100, 100; -- second batch
etc.
I recommend using an order by
clause when using limit. And if you must have consistent results, do all the piecemeal selects in a single transaction.
Upvotes: 1