123_stack
123_stack

Reputation: 3

SQL: How to join tables with 1+ millions of records

I want to join two tables ("products" table has 1.5 millions of records) using the following query, but after 15 minutes the query was still running and my pc was overheating (it's a lenovo v330-14ikb with 8gb of RAM), so I stopped it.

I am very new to indexes, and I tried by creating the followings:

This is the query:

SELECT a.customer_id, (SUM(a.revenues) / SUM(b.costs) :: FLOAT) AS roi
FROM orders a
JOIN products b
ON a.customer_id = b.customer_id
WHERE a.customer_id IN (
    SELECT customer_id FROM (SELECT 
        customer_id,
        COUNT(*) AS n_products
    FROM products
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 5) x
)
GROUP BY a.customer_id  
ORDER BY roi DESC

The output should return the ratio of revenues/costs for the top 5 customers by number of products they bought.

I am using pgadmin. Can someone explain me how to speed up and make it compile? Thank you in advance.

Upvotes: 0

Views: 2837

Answers (1)

user330315
user330315

Reputation:

I don't think you need to aggregate twice as far as I can tell.

select customer_id, roi
from (
  select o.customer_id, 
         sum(o.revenues) / sum(p.costs)::float as roi, 
         count(*) as n_products
  from orders o
    join products p on o.customer_id = p.customer_id
  group by o.customer_id  
  order by n_products
  limit 5
) t
order by roi desc

Alternatively try to aggregate the two tables separately, then join the results:

select o.customer_id, o.revenues / p.costs::numeric as roi
from (
  select customer_id, sum(revenues) as revenues
  from orders
  group by customer_id
) o
  join (
    select customer_id, sum(costs) as costs, count(*) n_products
    from products
    group by customer_id
  ) p on p.customer_id = o.customer_id
order by p.n_products desc
limit 5  

Upvotes: 1

Related Questions