karan
karan

Reputation: 29

Query Performance in MySql When working with large amount of data

I am having table that contains 20000 data. and i have to create new table that calculates some records about analysis.In new table i have to create 25 coloumns. from which i have created query and script that calculates all records according to my condition. but last three coloumns takes lots of time. i already optimized queries that calculates other 23 coloumns. now the query for last three coloumns is as below : (1) for annual not renewals

select  count(*)
    from  payment
    where  (find_in_set('Standard',product)
              OR  find_in_set('Lite',product)
              OR  find_in_set('Basic',product)
              OR  find_in_set('Nifty50',product)
              OR  find_in_set('NiftyOnly',product)
           )
      AND  '" + subscriptionDate + "' NOT BETWEEN subscr_date AND end_date
      AND  payment_type ='annual'
    HAVING  (MIN(subscr_date) < '" + subscriptionDate + "');

(2) for quarter not renewals:

select  count(*)
    from  payment
    where  (find_in_set('Standard',product)
              OR  find_in_set('Lite',product)
              OR  find_in_set('Basic',product)
              OR  find_in_set('Nifty50',product)
              OR  find_in_set('NiftyOnly',product)
           )
      AND  '" + subscriptionDate + "' NOT BETWEEN subscr_date AND end_date
      AND  (payment_type ='quarter'
              OR  payment_type='Quarterly'
           )
    HAVING  (MIN(subscr_date) < '" + subscriptionDate + "');

(3) for month not renewed :

select  count(*)
    from  payment
    where  (find_in_set('Standard',product)
              OR  find_in_set('Lite',product)
              OR  find_in_set('Basic',product)
              OR  find_in_set('Nifty50',product)
              OR  find_in_set('NiftyOnly',product)
           )
      AND  '" + subscriptionDate + "' NOT BETWEEN subscr_date AND end_date
      AND  payment_type ='month'
    HAVING  (MIN(subscr_date) < '" + subscriptionDate + "');

I am using find_in_set() for checking only products .this 3 queries will run under while loop and that loop will be iterated for 20000 data. now how can i reduce execution time for this queries?

Upvotes: 0

Views: 1802

Answers (3)

Rick James
Rick James

Reputation: 142298

HAVING, and the COUNT and MIN don't make sense without a GROUP BY. After fixing (or explaining) this, we may be able to help you further.

Try to write complex SQL, not iterate over rows. That will probably speed things up.

Let's see SHOW CREATE TABLE. Those queries need this:

INDEX(payment_type, subscr_date)

WHERE "constant" BETWEEN start AND end is hard to optimize.

Upvotes: 0

Bill Karwin
Bill Karwin

Reputation: 562358

You seem to have a column product which must be a comma-separated string of product names.

You can't optimize this query with indexes. This is one of many reasons why storing comma-separated lists of values is a bad idea in a relational database.

Think of a telephone book. If I ask you to find all entries in the phone book whose first name is 'Karan', it's hard to do. They could be anywhere in the book, not grouped together, because the sorted order of entries isn't by first name.

It's a similar problem when you search for an entry in a comma-separated list like you're doing. The value you're looking for might be at the start of the list, or at the end, or in the middle. Even if there's an index on the column, it can't use the sorted order of the index to find entries in the middle or at the end of the comma-separated list.

You shouldn't use comma-separated lists if you want to search for individual elements in the list. You should store the elements one per row in a second table that refers to your payment table.

This is a really basic database normalization practice, which you should have studied if you want to do database design. This is probably why Strawberry thought it was appropriate to post his totally unhelpful and dismissive comments (though he/she is wrong—it's never appropriate to be rude).

Upvotes: 1

Dimgold
Dimgold

Reputation: 2944

SQL queries run better when simple boolean conditions are applied on rows. Since most of your conditions are find_in_set, this query needs to perform a numerous amount of actions per each row.

The best practice will be to normalize the data - in your case you'll need to bring it to 1NF - which is also known as unpivotting or melting.

Instead of a row that looks like this:

KEY SET

1 o1,o2,o3

the table will look like this:

KEY Option

1 o1

1 o2

1 o3

Upvotes: 0

Related Questions