kmligue
kmligue

Reputation: 99

Need help in SQL query performance

I have this table:

 CREATE TABLE `property_ads_history` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `property_ad_id` int(10) unsigned NOT NULL,
 `advertiser_type` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `agency_reference_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `average_sale_price` double(8,2) NOT NULL DEFAULT '-1.00',
 `bathrooms` double(8,2) NOT NULL DEFAULT '-1.00',
 `bedrooms` double(8,2) NOT NULL DEFAULT '-1.00',
 `carports` double(8,2) NOT NULL DEFAULT '-1.00',
 `DELETE_country` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `created_reason` enum('Geocoded','Sanitized Parking','Sanitized Representation','Sanitized Address','Scraped','URL Inserted','QA Sanitized Address','QA Sanitized Representation','QA Sanitized Parkings') COLLATE utf8_unicode_ci DEFAULT NULL,
 `description` longtext COLLATE utf8_unicode_ci NOT NULL,
 `ensuite_bathrooms` double(8,2) NOT NULL DEFAULT '-1.00',
 `DELETE_ad_expired_at` datetime NOT NULL,
 `floor_area` double(8,2) NOT NULL DEFAULT '-1.00',
 `formatted_address` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `garages` double(8,2) NOT NULL DEFAULT '-1.00',
 `geocode_status` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `is_represented` tinyint(1) DEFAULT NULL,
 `land_area` double(8,2) NOT NULL DEFAULT '-1.00',
 `latitude` double(10,6) NOT NULL,
 `location_id` int(10) unsigned DEFAULT NULL,
 `longitude` double(10,6) NOT NULL,
 `off_street_parkings` double(8,2) NOT NULL DEFAULT '-1.00',
 `official_property_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `page_url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `parking` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `posted_at` datetime NOT NULL,
 `posted_at_string` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `postal_code` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `price` double(10,2) NOT NULL DEFAULT '-1.00',
 `primary_image` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `DELETE_property_ad_created_at` datetime NOT NULL,
 `property_type` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `rateable_value` double(10,2) NOT NULL DEFAULT '-1.00',
 `recent_sale_1` double(10,2) NOT NULL DEFAULT '-1.00',
 `recent_sale_2` double(10,2) NOT NULL DEFAULT '-1.00',
 `recent_sale_3` double(10,2) NOT NULL DEFAULT '-1.00',
 `reviewer_comments` longtext COLLATE utf8_unicode_ci NOT NULL,
 `route` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `source_id` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `street_number` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `user_id` int(10) unsigned DEFAULT NULL,
 `user_provided_address` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `deleted_at` timestamp NULL DEFAULT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `unique_index` (`created_at`,`created_reason`,`source_id`),
 KEY `property_ads_history_property_ad_id_foreign` (`property_ad_id`),
 KEY `property_ads_history_location_id_foreign` (`location_id`),
 KEY `created_reason` (`created_reason`)
);

This is my SQL query:

SELECT * FROM `property_ads_history` `t1` 
WHERE `t1`.`created_at` >= '2016-04-13 00:00:00' 
AND `t1`.`created_reason` = 'Scraped' 
AND (`t1`.`price` > -1 OR `t1`.`price` <> 999999.99) 
AND (
  SELECT `t2`.`price` FROM `property_ads_history` `t2` 
  WHERE `t2`.`property_ad_id` = `t1`.`property_ad_id` 
  AND `t2`.`created_at` < `t1`.`created_at` 
  AND (`t2`.`price` > -1 OR `t2`.`price` <> 999999.99) ORDER BY  
  DATE(`t2`.`created_at`) DESC LIMIT 1
) <> `t1`.`price` GROUP BY `t1`.`property_ad_id`;

All I want in that query are:

  1. I want to get the records that meet certain criteria, particularly that the created_at field is within the last 24 hours

  2. I need to get the record that immediately precedes the records in #1

  3. further filter results in #1 to records whose price column has had a different value in a record's history, other than -1

The problem I'm facing is that the query is too slow. I have a million records. It would take more or less 2 mins for the query to execute. I'm guessing GROUP BY makes the query slow. I think I need to make a composite index but I'm not sure how to do it.

Explain result:


id | select_type        | table | type | possible_keys                                             | key                                         | key_len | ref                                | rows   | Extra
1  | PRIMARY            | t1    | ref  | unique_index,created_reason                               | created_reason                              | 2       | const                              | 590030 | Using where; Using temporary; Using filesort
2  | DEPENDENT SUBQUERY | t2    | ref  | unique_index,property_ads_history_property_ad_id_foreign  | property_ads_history_property_ad_id_foreign | 4       | reis_real_estate.t1.property_ad_id | 7      | Using where; Using filesort

Upvotes: 1

Views: 187

Answers (1)

Lussi
Lussi

Reputation: 3

Instead of using subquery, use joins it will reduce the query time. below query might help you.

select  t1.*, t2.price
    from  `property_ads_history` AS `t1`
    INNER JOIN `property_ads_history` As `t2`
    where  `t1`.`created_at` >= '2016-04-13 00:00:00'
      AND  `t1`.`created_reason` = 'Scraped'
      AND  (`t1`.`price` > -1 OR  `t1`.`price` <> 999999.99 )
      AND  `t2`.`created_at` < `t1`.`created_at`
      AND  (`t2`.`price` > -1 OR  `t2`.`price` <> 999999.99 )
    ORDER BY  DATE(`t2`.`created_at`) DESC
    GROUP BY  `t1`.`property_ad_id`;

Also, put indexes on your column that will also reduce time of your query.

Upvotes: 0

Related Questions