JPBro
JPBro

Reputation: 85

SQLite - Help optimizing aggregate total of previous rows with multiple conditions

I'm trying to get conditional SUMs of the Value column for each record in the table for all of the "previous" records grouped by the same "Category" field value, and the same "Approved" field value, then divided into Negative and Positive sums.

In my program, users can create document record in any order, so "previous" is defined as:

If Approved=TRUE, then "previous" records have an older ApprovedDate field value than the current record. If the ApprovedDate field values are the same, then "previous" records have a lower DocumentNumber field value.

If Approved=FALSE, then "previous" records have an older IssuedDate field value than the current record. If the IssuedDate field values are the same, then the "previous" records have a lower DocumentNumber field value.

For example, in the following table:

CREATE TABLE Changes (GUID TEXT, Value REAL, DocumentNumber TEXT, Approved INTEGER, ApprovedDate TEXT, IssuedDate TEXT, Category TEXT);
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('4F7253A4E1B3D841B84D4A82B4F0E7A2', '11', 0, 18526.7, '', '2009-03-31T05:00:00Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('D97537852E927B499C21C14F3D13CF06', '1', 0, 0, '', '2008-11-10T05:00:00Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('857DADB463807345918729B33399B36F', '2', 0, 0, '', '2008-11-10T05:00:00Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('7989D242E05AFF4FB5EE99114822BF80', '21', 0, 50112, '', '2009-07-22T05:00:00Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('16A0AB27FD3A784D9E0A14406C7683E0', '3', 0, 0, '', '2009-01-15T05:00:00Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('D3D7B1C306D38C438FC3DEDFCB57D411', '131', 0, 17204, '', '2010-12-14T05:00:00Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('2C89D974DDF86743A0D7D62B385FBDEF', '147', 0, 0, '', '2010-12-01T05:00:00Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('F371D4237C837D448824697EB0162905', '198', 0, 0, '', '2011-01-10T05:00:00Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('433D64C871AE4E46A0E1BFCE2BB69BA7', '364', 0, 0, '', '2011-11-14T05:00:00Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('808496DBDE76CB4F911396BB817724F3', '352', 0, 0, '', '2011-10-17T05:00:00Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('9545DEF1666B5F4D8626F19F8E9E9333', '418', 0, 10948, '', '2012-03-07T22:19:18Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('244D7D89B79E0F4E91100E4ADB300656', '439', 0, 50945, '', '2012-04-27T20:33:26Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('115A427BBB1D2C43BA11D9E5875FAA2C', '465', 0, 480049, '', '2012-07-20T16:17:54Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('3A2271EFCC767E4CA40017E68802F10C', '478', 0, 54298, '', '2012-08-01T17:26:38Z', 'UNKNOWN');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('99D0EFC5A9F1AA498DB1A4CDF294129B', '490', 0, 11500, '', '2012-09-18T14:23:13Z', 'ALTER');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('38B2E3A379C5084998E6A84D496AC555', '491', 0, 26088, '', '2012-09-25T06:00:00Z', 'ALTER');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('8902831C8FAD4941841EE2847656BDAF', '494', 0, -825, '', '2012-10-16T14:20:06Z', 'ALTER');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('7AFDB08A002AE54A8DE7699855AEBE30', '495', 0, 221, '', '2012-10-16T14:21:27Z', 'ALTER');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('38A2CCEF5F0B294AA8B8752F461D121D', '496', 0, 0, '', '2012-12-24T01:11:15Z', 'ALTER');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('24CCD5CE409E674593108CBD816DBCCE', '486', 1, -825, '2012-10-01T21:42:52Z', '2012-09-17T20:42:12Z', 'ALTER');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('C7458704E36C8F448C1F3A485EB08304', '485', 1, 10000, '2012-10-01T21:25:56Z', '2012-09-11T21:29:44Z', 'ALTER');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('B511953AE6FB6446A63AA83C159057BE', '487', 1, 82170, '2012-10-01T21:42:51Z', '2012-09-17T20:46:41Z', 'ALTER');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('EC977BC304A971439D04BB9DF4D8188A', '488', 1, 15500, '2012-10-01T20:58:15Z', '2012-09-18T06:00:00Z', 'ALTER');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('D9B1F0C0A8E490448697B783639E09E0', '489', 1, 11503, '2012-10-01T21:42:50Z', '2012-09-18T13:56:18Z', 'ALTER');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('698BB6D65832D146A49727C717A591A1', '492', 1, 2787, '2012-10-01T21:10:06Z', '2012-09-25T15:55:02Z', 'ALTER');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('155D4F2B1854B34FABCDE8CF20F1E44C', '493', 1, 12162, '2012-10-01T21:10:06Z', '2012-09-25T16:04:40Z', 'ALTER');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('137C9BF2B1EFD34B8831ADA70C5F9431', '1', 1, 369543, '2011-12-08T13:41:04Z', '1899-12-30T05:00:00Z', 'DRAW');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('7F29FC7114BD10468AE92A047345B5DB', '2', 1, 7258, '2011-12-08T13:41:04Z', '2011-10-20T05:00:00Z', 'DRAW');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('6B66D8EAD88E6E4FA29401CD524B978A', '3', 1, 979321, '2011-12-08T13:41:04Z', '2011-11-08T05:00:00Z', 'DRAW');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('7F393B712B213041A6DD211E04F6DCA6', '4', 1, 14998, '2012-04-20T15:16:21Z', '2012-04-18T21:07:07Z', 'DRAW');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('2255F84E7C7DA04389765724872D6413', '5', 1, 58926, '2012-04-20T15:16:23Z', '2012-04-18T21:13:15Z', 'DRAW');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('DB4A5588DEB9F34C868F7AD1CB13ACC3', '6', 1, 13232, '2012-04-20T15:16:05Z', '2012-04-18T21:17:00Z', 'DRAW');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('B5231AE40F8E7D41BA0A4D09614CBDF9', '7', 1, 10176, '2012-04-20T15:16:25Z', '2012-04-18T21:19:41Z', 'DRAW');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('2362D54FCC53E447AC7D8289EA89FD05', '8', 1, 17556, '2012-04-20T15:16:04Z', '2012-04-18T21:21:20Z', 'DRAW');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('6ED4565CA041704B8D006EDA4A1E4CF9', '9', 1, 399639, '2012-05-30T16:32:43Z', '2012-05-17T06:00:00Z', 'DRAW');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('B21BE07E3E42C2418C70AD17862D3AE1', '10', 1, 6231, '2012-08-16T16:55:00Z', '2012-08-02T16:02:03Z', 'DRAW');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('8FD252A50137754A98698F93AC9B01A7', '11', 1, 629, '2012-08-16T16:54:58Z', '2012-08-02T16:07:57Z', 'DRAW');
INSERT INTO Changes (GUID, DocumentNumber, Approved, Value, ApprovedDate, IssuedDate, Category) VALUES ('1B9AFD2C20362F48A486E8A535B29AF5', '20', 1, -113810, '2011-12-13T17:15:53Z', '2010-02-10T05:00:00Z', 'UNKNOWN');

The results should be:

[Results][2]:

|                             GUID |                             SORTID |          VALUE | POSITIVE_PREVIOUS_TOTAL | NEGATIVE_PREVIOUS_TOTAL |
----------------------------------------------------------------------------------------------------------------------------------------------
| 99D0EFC5A9F1AA498DB1A4CDF294129B |   ALTER_0_2012-09-18T14:23:13Z_490 |          11500 |                       0 |                       0 |
| 38B2E3A379C5084998E6A84D496AC555 |   ALTER_0_2012-09-25T06:00:00Z_491 |          26088 |                   11500 |                       0 |
| 8902831C8FAD4941841EE2847656BDAF |   ALTER_0_2012-10-16T14:20:06Z_494 |           -825 |                   37588 |                       0 |
| 7AFDB08A002AE54A8DE7699855AEBE30 |   ALTER_0_2012-10-16T14:21:27Z_495 |            221 |                   37588 |                    -825 |
| 38A2CCEF5F0B294AA8B8752F461D121D |   ALTER_0_2012-12-24T01:11:15Z_496 |              0 |                   37809 |                    -825 |
| EC977BC304A971439D04BB9DF4D8188A |   ALTER_1_2012-10-01T20:58:15Z_488 |          15500 |                   92170 |                    -825 |
| 698BB6D65832D146A49727C717A591A1 |   ALTER_1_2012-10-01T21:10:06Z_492 |           2787 |                  119173 |                    -825 |
| 155D4F2B1854B34FABCDE8CF20F1E44C |   ALTER_1_2012-10-01T21:10:06Z_493 |          12162 |                  121960 |                    -825 |
| C7458704E36C8F448C1F3A485EB08304 |   ALTER_1_2012-10-01T21:25:56Z_485 |          10000 |                       0 |                       0 |
| D9B1F0C0A8E490448697B783639E09E0 |   ALTER_1_2012-10-01T21:42:50Z_489 |          11503 |                  107670 |                    -825 |
| B511953AE6FB6446A63AA83C159057BE |   ALTER_1_2012-10-01T21:42:51Z_487 |          82170 |                   10000 |                    -825 |
| 24CCD5CE409E674593108CBD816DBCCE |   ALTER_1_2012-10-01T21:42:52Z_486 |           -825 |                   10000 |                       0 |
| 137C9BF2B1EFD34B8831ADA70C5F9431 |      DRAW_1_2011-12-08T13:41:04Z_1 |         369543 |                       0 |                       0 |
| 7F29FC7114BD10468AE92A047345B5DB |      DRAW_1_2011-12-08T13:41:04Z_2 |           7258 |                  369543 |                       0 |
| 6B66D8EAD88E6E4FA29401CD524B978A |      DRAW_1_2011-12-08T13:41:04Z_3 |         979321 |                  376801 |                       0 |
| 2362D54FCC53E447AC7D8289EA89FD05 |      DRAW_1_2012-04-20T15:16:04Z_8 |          17556 |                 1453454 |                       0 |
| DB4A5588DEB9F34C868F7AD1CB13ACC3 |      DRAW_1_2012-04-20T15:16:05Z_6 |          13232 |                 1430046 |                       0 |
| 7F393B712B213041A6DD211E04F6DCA6 |      DRAW_1_2012-04-20T15:16:21Z_4 |          14998 |                 1356122 |                       0 |
| 2255F84E7C7DA04389765724872D6413 |      DRAW_1_2012-04-20T15:16:23Z_5 |          58926 |                 1371120 |                       0 |
| B5231AE40F8E7D41BA0A4D09614CBDF9 |      DRAW_1_2012-04-20T15:16:25Z_7 |          10176 |                 1443278 |                       0 |
| 6ED4565CA041704B8D006EDA4A1E4CF9 |      DRAW_1_2012-05-30T16:32:43Z_9 |         399639 |                 1471010 |                       0 |
| 8FD252A50137754A98698F93AC9B01A7 |     DRAW_1_2012-08-16T16:54:58Z_11 |            629 |                 1876880 |                       0 |
| B21BE07E3E42C2418C70AD17862D3AE1 |     DRAW_1_2012-08-16T16:55:00Z_10 |           6231 |                 1870649 |                       0 |
| D97537852E927B499C21C14F3D13CF06 |   UNKNOWN_0_2008-11-10T05:00:00Z_1 |              0 |                       0 |                       0 |
| 857DADB463807345918729B33399B36F |   UNKNOWN_0_2008-11-10T05:00:00Z_2 |              0 |                       0 |                       0 |
| 16A0AB27FD3A784D9E0A14406C7683E0 |   UNKNOWN_0_2009-01-15T05:00:00Z_3 |              0 |                       0 |                       0 |
| 4F7253A4E1B3D841B84D4A82B4F0E7A2 |  UNKNOWN_0_2009-03-31T05:00:00Z_11 | 18526.69921875 |                       0 |                       0 |
| 7989D242E05AFF4FB5EE99114822BF80 |  UNKNOWN_0_2009-07-22T05:00:00Z_21 |          50112 |          18526.69921875 |                       0 |
| 2C89D974DDF86743A0D7D62B385FBDEF | UNKNOWN_0_2010-12-01T05:00:00Z_147 |              0 |          68638.69921875 |                       0 |
| D3D7B1C306D38C438FC3DEDFCB57D411 | UNKNOWN_0_2010-12-14T05:00:00Z_131 |          17204 |          68638.69921875 |                       0 |
| F371D4237C837D448824697EB0162905 | UNKNOWN_0_2011-01-10T05:00:00Z_198 |              0 |          85842.69921875 |                       0 |
| 808496DBDE76CB4F911396BB817724F3 | UNKNOWN_0_2011-10-17T05:00:00Z_352 |              0 |          85842.69921875 |                       0 |
| 433D64C871AE4E46A0E1BFCE2BB69BA7 | UNKNOWN_0_2011-11-14T05:00:00Z_364 |              0 |          85842.69921875 |                       0 |
| 9545DEF1666B5F4D8626F19F8E9E9333 | UNKNOWN_0_2012-03-07T22:19:18Z_418 |          10948 |          85842.69921875 |                       0 |
| 244D7D89B79E0F4E91100E4ADB300656 | UNKNOWN_0_2012-04-27T20:33:26Z_439 |          50945 |          96790.69921875 |                       0 |
| 115A427BBB1D2C43BA11D9E5875FAA2C | UNKNOWN_0_2012-07-20T16:17:54Z_465 |         480049 |         147735.69921875 |                       0 |
| 3A2271EFCC767E4CA40017E68802F10C | UNKNOWN_0_2012-08-01T17:26:38Z_478 |          54298 |         627784.69921875 |                       0 |
| 1B9AFD2C20362F48A486E8A535B29AF5 |  UNKNOWN_1_2011-12-13T17:15:53Z_20 |        -113810 |                       0 |                       0 |

Essentially - all records are group by the Category and Approved field values, then sorted by the Approved Date, DocumentNumber (if Approved=TRUE), or the IssuedDate, DocumentNumber (if Approved=FALSE). Finally a running sum on a record-by-record basis is tallied for the value of each previous record with the same category and approved value, with negative sums and positive sums reported in separate columns.

I can get the desired results with sub-selects, but the performance is poor (700-ish records takes 6.5 seconds). I've been trying to use CASE WHEN statements combined with various GROUP BY and HAVING clauses, but I can't seem to get the correct values (I can't get the logic to determine the "previous" records to meet my requirements). However, using this approach the performance is very good (less than 200ms in most cases for the same data set).

Here's my attempt that works (SQLite only, T-SQL below), but takes a long time:

SELECT a.GUID, 
SUM(CASE WHEN b.Value>0 THEN b.Value ELSE 0 END) as positive_previous_total, 
SUM(CASE WHEN b.Value<0 THEN b.Value ELSE 0 END)  as negative_previous_total 
FROM Changes AS a left join Changes as b 
ON b.rowid != a.rowid 
AND b.Approved =a.Approved 
AND b.Category=a.Category 
AND ((IFNULL(SUBSTR(CASE WHEN b.Approved THEN b.ApprovedDate ELSE b.IssuedDate END, 1, 10), '0000-00-00')<IFNULL(SUBSTR(CASE WHEN a.Approved THEN a.ApprovedDate ELSE a.IssuedDate END, 1, 10), '0000-00-00') 
OR ((IFNULL(SUBSTR(CASE WHEN b.Approved THEN b.ApprovedDate ELSE b.IssuedDate END, 1, 10), '0000-00-00')=IFNULL(SUBSTR(CASE WHEN a.Approved THEN a.ApprovedDate ELSE a.IssuedDate END, 1, 10), '0000-00-00') AND b.DocumentNumber<a.DocumentNumber)))) 
GROUP BY a.rowid

Here's my attempt that works (T-SQL)

   SELECT 
    a.[GUID], 
    [positive_previous_total] = SUM(CASE WHEN b.Value>0 THEN b.Value ELSE 0 END), 
    [negative_previous_total] = SUM(CASE WHEN b.Value<0 THEN b.Value ELSE 0 END) 
FROM 
    #Changes a 
    LEFT OUTER JOIN #Changes b 
        ON 
        b.[GUID]    <> a.[GUID] AND
        b.Approved  = a.Approved AND
        b.Category  = a.Category 
        AND 
        (
        ISNULL(SUBSTRING(CASE WHEN b.Approved=1 THEN b.ApprovedDate ELSE b.IssuedDate END, 1, 10), '0000-00-00') 
          < ISNULL(SUBSTRING(CASE WHEN a.Approved=1 THEN a.ApprovedDate ELSE a.IssuedDate END, 1, 10), '0000-00-00')
        OR 
            (
            ISNULL(SUBSTRING(CASE WHEN b.Approved=1 THEN b.ApprovedDate ELSE b.IssuedDate END, 1, 10), '0000-00-00')
              =ISNULL(SUBSTRING(CASE WHEN a.Approved=1 THEN a.ApprovedDate ELSE a.IssuedDate END, 1, 10), '0000-00-00') 
            AND 
            b.DocumentNumber<a.DocumentNumber
            )
        ) 
GROUP BY a.[GUID]

Does anybody have any suggestions for getting the results I need more optimally?

Thanks in advance for any help.

Upvotes: 2

Views: 786

Answers (2)

JPBro
JPBro

Reputation: 85

First of all - thanks to whytheq and the contributors in the following thread: Can this be simplified and therefore optimized which is worth a read even though the solutions are not all applicable to SQLite due to its limitations. I was able to implement some optimizations based on the suggestions to cut the SELECT time in half.

How it was done:

  1. Added and EffectiveDate column that stores the date portion of the ApprovedDate when Approved=TRUE, or IssuedDate when Approved=FALSE as an integer instead of as ISO8601 text.

  2. Added a new column called EffectiveID that stores the Category, Approved, zero-padded EffectiveDate and zero-padded DocumentNumber as a single value for quicker comparisons.

  3. Created an UPDATE TRIGGER on update of the Approved, IssuedDate and ApprovedDate columns to update the EffectiveDate column.

  4. Created a UPDATE TRIGGER on the EFfectiveDate, Category, Approved, and Number columns to update the EffectiveID field accordingly.

  5. Created a TRIGGER on insert of a new row to update the EffectiveDate accordingly.

  6. Added INDEXes for the EffectiveDate and EffectiveID columns

  7. Rearranged the ON clause tests to reduce the number of matches, reducing processing

  8. Changed the JOIN to select only the required subset of columns in the result set.

The final SQL is as follows:

SELECT 
a.GUID, 
TOTAL(CASE WHEN b.Value>0 THEN b.Value ELSE 0 END) as positive_previous_total, 
TOTAL(CASE WHEN b.Value<0 THEN b.Value ELSE 0 END) as negative_previous_total 
FROM 
Changes AS a 
LEFT JOIN 
(SELECT rowid, Value, Category, Approved, EffectiveId FROM Changes) as b 
ON 
b.EffectiveID < a.EffectiveID 
AND b.Category=a.Category
AND b.Approved=a.Approved
AND b.rowid != a.rowid
GROUP BY 
a.rowid

Upvotes: 2

whytheq
whytheq

Reputation: 35577

Please refer to suggestions HERE ON SO for ways you can speed things up.

Two or three SO heavyweights have contributed so I'd get in touch with your DBA and see what changes can be made.

Seems the consensus is that you need to add some indexing on the changes table.

I also likes the idea of adding a couple of calculated fields into this table to then make the subsequent script a lot more readable.

If you are able to change some of the data types then there's lots of suggestions in the answers.

Upvotes: 1

Related Questions