Starx
Starx

Reputation: 79021

Query to find the sum of rows in one table which equals to one or more row on another table

I need help with a query. The task is to get rows of one table whose amount will sum up to become a value in another table and vice versa.

An illustration of the example:

Table 1:                                Table2:

  r_id   |    r_date   |   r_amt          p_id   |    p_date   |   p_amt    
---------+-------------+--------        ---------+-------------+--------
    1    |  2/23/2012  |   200              1    |  3/22/2012  |   450
---------+-------------+--------        ---------+-------------+--------
    2    |  3/21/2012  |   100              2    |  5/25/2012  |   530
---------+-------------+--------        ---------+-------------+--------
    3    |  4/12/2012  |   300              3    |  5/26/2012  |   700
---------+-------------+--------        ---------+-------------+--------
    4    |  4/18/2012  |   250              4    |  5/26/2012  |   40
---------+-------------+--------        ---------+-------------+--------
    5    |  5/20/2012  |   130
---------+-------------+--------
    6    |  5/21/2012  |   740
---------+-------------+--------

Now these test datas are in such a way that, few rows of table 1 will sum up to become one row in table 2 and vice versa.

I want a query to analyse the above data in such a way that the sum of records in one table will be equal to one row in the other.

After the analyze is complete it should feed the data to a new table like this.

Lets call this table match

  m_id   |    tbl1     |   tbl2   | match_type
---------+-------------+----------+-----------
    1    |    1,4      |   1      |   n-1
---------+-------------+----------+-----------
    1    |    2,3,5    |   2      |   n-1
---------+-------------+----------+-----------
    1    |     6       |   3,4    |   1-n
---------+-------------+----------+-----------

Right now, I am calculating the sum of each table and entering on a temporary table then comparing with that table to get the above result. But for more than 10 rows the queries become really slow and hang my development server.

Link to my test Queries

What is the efficient way to perform this task?

Upvotes: 1

Views: 1093

Answers (1)

Chris Travers
Chris Travers

Reputation: 26464

Ok, so here is a rough answer. I have not tested it. Recursive CTE's have some odd gotchas and it is possible I ran into one, but this should get you going. There may also be some performance tuning possible but this may get you going.

The algorithm roughly goes as follows:

  1. Generate all permutations of all rows

  2. Compare each permutation on one side with each row on the other

The first will be done with recursive CTE's. The second with a simple join.

WITH RECURSIVE table1_combos as (
     SELECT r_id as last_id, r_id::text as path, r_amt as amount
       FROM table1
  UNION ALL 
     SELECT r.r_id as last_id, p.path || ',' || r_id::text, p.amount + r_amt
       FROM table1_combos p
 CROSS JOIN table1 r
      WHERE r.r_id < p.last_id
),
RECURSIVE table2_combos AS (
     SELECT p_id as last_id, p_id::text as path, p_amt as amount
       FROM table2
  UNION ALL 
     SELECT p_id AS last_id, p.path || ',' || p_id::text, p.amount + p_amt
       FROM table2_combos p
 CROSS JOIN table2 
      WHERE p_id < p.last_id
)
SELECT c.path, p_id::text, c.amount, 'n-1' as type
  FROM table1_combos c
  JOIN table2 t ON c.amount = p_amt
UNION ALL
SELECT r_id::text, c.path, c.amount, '1-n' as type
  FROM table2_combos c
  JOIN table1 t ON r_amt = c.amount; 

As for performance the fundamental problem is that you will have a great deal of space to search. There isn't a reasonably easy way to do that, unfortunately. The combination space is extremely large, and it gets much larger with each additional row you add.

Hmmm revisiting my estimates. A 10 row table should generate 6.3 million combinations, while an 11 row table should generate 68.6 million combinations. In PostgreSQL you can check the number of expected combinations with the following SQL statement:

select sum(factorial(11)/factorial(f)) from generate_series(1, 11) f;

For an 11 row table. Note as follows:

select sum(factorial(100)/factorial(f)) from generate_series(1, 100) f;

         sum                                                                    

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------
 1603607401161831447335715093560135199544316103019165207641822220922316539151565
30909999021448995531507013709811500779735358328288932830176709764490323163992001
.00000000000000000000
(1 row)

If you have a 100 row table you will be waiting for a while.....

Now you might be able to further address this by bounding the CTE itself saying "stop when you reach the max of the other table" for example.

Upvotes: 2

Related Questions