Reputation: 11
I have a problem concerning database created in MySQL Workbench. My data base has 2 tables: data and experiments created as:
create table data(
preprocessed_datasetID int NOT NULL PRIMARY KEY AUTO_INCREMENT,
name varchar(1000), samples int, attributes int, missing_values bool,
y_test LONGBLOB, x_test LONGBLOB, y_train LONGBLOB, x_train LONGBLOB, y_train_noise LONGBLOB, iteration int,
dataset_source varchar(1000), note varchar(4000),
test_size float,
label_noise_percentage_lvl float, random_label_noise_percentage_lvl float, label_noise_percentage_lvl_ones float, label_noise_percentage_lvl_zeros float, real_noise float,
datasetID int
);
create table experiments (
experimentID int NOT NULL PRIMARY KEY AUTO_INCREMENT,
preprocessed_datasetID int,
datasetID int,
dataset_source varchar(1000), note varchar(4000),
test_size float,
label_noise_percentage_lvl float, random_label_noise_percentage_lvl float, label_noise_percentage_lvl_ones float, label_noise_percentage_lvl_zeros float, real_noise float,
samples int, attributes int, missing_values bool,
classifier_name varchar(1000), classifier_parameters varchar(5000), saved_classifier_id varchar(1000),
accuracy float, results_precision float, recall float, ft_pn BLOB, mcc float, f1score float, confusion_matrix BLOB, predictions LONGBLOB
#,e1 bool, e2 bool, e3 bool, e4 bool, e5 bool, e6 bool, e7 bool, e8 bool, e9 bool, e10 bool
);
I use them in a program written in python in which I use a query:
select * from data order by preprocessed_datasetID asc limit x, y
I iterate through the data table, make some operations and save them in a experiments table.
My problem is that the above query executes in a very long time. I know my table stores a lot of data but making query like:
select * from data order by preprocessed_datasetID asc limit 6000, 2
may take even 90s
I didn't manage to find any solution concerining some form of database optimization. Is there anything I may do to make the exxevution better?
I use:
Thanks in advance! ;)
Upvotes: 0
Views: 58
Reputation: 142560
Perhaps the only thing possible is to have
INDEX(preprocessed_datasetID)
Skipping over 8000 rows is slow, due to the necessity of actually touching each one. The suggested index makes it a lighter touch. In fact, without the index, it must copy every row over, then sort them. Only then does it step over 8000 rows to deliver the 2 you ask for.
With the index, the copy and sort are avoided.
Another tip... If you don't need all the columns, specify only the desired columns. This is especially true if the table has TEXT
or BLOB
columns.
More
Another approach that may run faster:
SELECT data.*
FROM (
SELECT preprocessed_datasetID
from data
order by preprocessed_datasetID asc
LIMIT 6000, 2
) AS x
JOIN data USING(preprocessed_datasetID)
ORDER BY data.preprocessed_datasetID ASC
The technique:
Upvotes: 1