Reputation: 79021
I need help with a query. The task is to get rows of one table whose amount will sum up to become a value in another table and vice versa.
An illustration of the example:
Table 1: Table2:
r_id | r_date | r_amt p_id | p_date | p_amt
---------+-------------+-------- ---------+-------------+--------
1 | 2/23/2012 | 200 1 | 3/22/2012 | 450
---------+-------------+-------- ---------+-------------+--------
2 | 3/21/2012 | 100 2 | 5/25/2012 | 530
---------+-------------+-------- ---------+-------------+--------
3 | 4/12/2012 | 300 3 | 5/26/2012 | 700
---------+-------------+-------- ---------+-------------+--------
4 | 4/18/2012 | 250 4 | 5/26/2012 | 40
---------+-------------+-------- ---------+-------------+--------
5 | 5/20/2012 | 130
---------+-------------+--------
6 | 5/21/2012 | 740
---------+-------------+--------
Now these test datas are in such a way that, few rows of table 1 will sum up to become one row in table 2 and vice versa.
I want a query to analyse the above data in such a way that the sum of records in one table will be equal to one row in the other.
After the analyze is complete it should feed the data to a new table like this.
Lets call this table match
m_id | tbl1 | tbl2 | match_type
---------+-------------+----------+-----------
1 | 1,4 | 1 | n-1
---------+-------------+----------+-----------
1 | 2,3,5 | 2 | n-1
---------+-------------+----------+-----------
1 | 6 | 3,4 | 1-n
---------+-------------+----------+-----------
Right now, I am calculating the sum of each table and entering on a temporary table then comparing with that table to get the above result. But for more than 10 rows the queries become really slow and hang my development server.
What is the efficient way to perform this task?
Upvotes: 1
Views: 1093
Reputation: 26464
Ok, so here is a rough answer. I have not tested it. Recursive CTE's have some odd gotchas and it is possible I ran into one, but this should get you going. There may also be some performance tuning possible but this may get you going.
The algorithm roughly goes as follows:
Generate all permutations of all rows
Compare each permutation on one side with each row on the other
The first will be done with recursive CTE's. The second with a simple join.
WITH RECURSIVE table1_combos as (
SELECT r_id as last_id, r_id::text as path, r_amt as amount
FROM table1
UNION ALL
SELECT r.r_id as last_id, p.path || ',' || r_id::text, p.amount + r_amt
FROM table1_combos p
CROSS JOIN table1 r
WHERE r.r_id < p.last_id
),
RECURSIVE table2_combos AS (
SELECT p_id as last_id, p_id::text as path, p_amt as amount
FROM table2
UNION ALL
SELECT p_id AS last_id, p.path || ',' || p_id::text, p.amount + p_amt
FROM table2_combos p
CROSS JOIN table2
WHERE p_id < p.last_id
)
SELECT c.path, p_id::text, c.amount, 'n-1' as type
FROM table1_combos c
JOIN table2 t ON c.amount = p_amt
UNION ALL
SELECT r_id::text, c.path, c.amount, '1-n' as type
FROM table2_combos c
JOIN table1 t ON r_amt = c.amount;
As for performance the fundamental problem is that you will have a great deal of space to search. There isn't a reasonably easy way to do that, unfortunately. The combination space is extremely large, and it gets much larger with each additional row you add.
Hmmm revisiting my estimates. A 10 row table should generate 6.3 million combinations, while an 11 row table should generate 68.6 million combinations. In PostgreSQL you can check the number of expected combinations with the following SQL statement:
select sum(factorial(11)/factorial(f)) from generate_series(1, 11) f;
For an 11 row table. Note as follows:
select sum(factorial(100)/factorial(f)) from generate_series(1, 100) f;
sum
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------
1603607401161831447335715093560135199544316103019165207641822220922316539151565
30909999021448995531507013709811500779735358328288932830176709764490323163992001
.00000000000000000000
(1 row)
If you have a 100 row table you will be waiting for a while.....
Now you might be able to further address this by bounding the CTE itself saying "stop when you reach the max of the other table" for example.
Upvotes: 2