macroman
macroman

Reputation: 23

Snowflake SQL Counting and Summing from Another Table based on Matching Value

I am trying to do an SQL query in snowflake to count up values from a different table based on a value from the first table. For example, I find a value that matches in table1 and table2, and I count the rows that match it from table 2.

When I attempt to do this, the query runs but the resulting data is incorrectly counting. It looks as if it is doing some sort of odd breakout/cumulative counting. I've tried reading examples and googling for examples like this but what's interesting is in non-snowflake SQL, the examples I've seen do not need to "group by" the count I'm trying to select.

In snowflake it gives me an error unless I group the (select count() from... But doing this seems to return a result that is over counting/summing the value by a lot, when I remove that (select count()... row and the group by, it does everything else correctly so I'm guessing there's something specifically wrong there.

select 
    "naming_db"."public"."username".name as "Name",
    "naming_db"."public"."addresses".address as "Address",
    count(*) as "Shoe Count",
    (select count(*) from "data_db"."records"."orderID" where 
    "data_db"."records"."orderID".ID = "data_db"."records"."userinfo".ID) as "Orders",
    sum("data_db"."records"."userinfo".revenue) as "Spend"
from "data_db"."records"."userinfo"
    join "naming_db"."public"."username" on "naming_db"."public"."username".name = "data_db"."records"."userinfo".nameid
    join "naming_db"."public"."addresses" on "naming_db"."public"."addresses".address = "data_db"."records"."userinfo".addressid
    full join "data_db"."records"."orderID" on "data_db"."records"."userinfo".ID = "data_db"."records"."orderID".ID
group by 1,2,4

The (select count(*) from... line I think is the issue but I can't seem to figure out how to write it in a way that would make the query run and return the results I want.

Just edited the query to make more sense. Was trying to obfuscate some of the info there but I realized it's too confusing sorry!

Here's the results that I would like. I get this when I remove the "Orders" line - the one that is (select count(*) from...

Correct but missing Orders

Here's the results that I'm getting when I add in that line for "Orders": Incorrect summing and breaks out more rows

Upvotes: 1

Views: 3403

Answers (3)

Hans Henrik Eriksen
Hans Henrik Eriksen

Reputation: 2880

After you rearranged the query, it looks like you have:

  1. userinfo is the User table that contains one line per user and also has an accumulated revenue amount per user
  2. orderID is the Order table with quantities per customer per product
  3. username and addresses are only used to display the user contact info (unimportant).

It is still hard to understand what your aggregations should show, so I've tried to guess:

  1. Shoe count is out in the blue in your revised query. It counts the number of accumulated lines per user, which is now the number of order lines. Shoe data used to be in the query, but you removed it, so Shoe Count makes no sense now.
  2. Orders most likely is the total number of orders per user, also the number of lines in the order table for that user. But there is no difference to Shoe Count, which makes me wonder why.
  3. Spend is the revenue per user.

The use of a FULL JOIN is confusing. It indicates that some orders are for non-registered users - not existing in the userinfo table. This doesn't make sense to me. Your examples indicates that all orders are from users in userinfo, so there should be no need for a FULL JOIN.
Why do you need it?

The solution should be really simple. Please advise me if I didn't understand your requirements:

SELECT
  ANY_VALUE(n.name) AS "Name",
  ANY_VALUE(a.address) AS "Address",
  COUNT(*) AS "Shoe Count", 
  COUNT(o.ID) AS "Orders",
  ANY_VALUE(u.revenue) AS "Spend"
FROM "data_db"."records"."userinfo" u
INNER JOIN "naming_db"."public"."username" n ON n.name = u.nameid
INNER JOIN "naming_db"."public"."addresses" a ON a.address = u.addressid
INNER JOIN "data_db"."records"."orderID" o ON o.ID = u.ID
GROUP BY u.ID

Upvotes: 1

Simeon Pilgrim
Simeon Pilgrim

Reputation: 26140

Your sub-select is a correlated sub-query which snow has limited support for. So if you pivot to a CTE which does the count for all ID's then join to that you should get what you are asking for..

WITH order_counts AS (
    SELECT ID, count(*) as count data_db.records.orderID group by 1
)
select 
    un.name as Name,
    a.address as Address,
    count(*) as Shoe Count,
    oc.count as Orders,
    sum(ui.revenue) as Spend
from data_db.records.userinfo as ui
join naming_db.public.username as un
    on un.name = ui.nameid
join naming_db.public.addresses as a
    on a.address = ui.addressid
left join order_counts as oc
    on ui.ID = oc.ID
group by 1,2,4

I also inserted some aliases so the SQL was more readable.

Upvotes: 0

Ed Bangga
Ed Bangga

Reputation: 13026

you can use sum() instead of using subquery on your column Shoes.

select 
    "table1".name as "Name",
    "table1".address as "Address",
    count(*) as "Counts",
    sum(case when coalesce("table3".shoeID, 0) != 0 then 1 else 0 end) as "Shoes",
    sum("table2".shoecount) as "Number of Shoes"
from "table2"
join "table1" on "table1".name = "table2".name
join "table1" on "table1".Address = "table2".Address
full join "table3" on "table3".shoeID = "table2".shoeID
group by 1,2,4

Upvotes: 0

Related Questions