How to get the count of distinct values until a time period Impala/SQL?

Question

I have a raw table recording customer ids coming to a store over a particular time period. Using Impala, I would like to calculate the number of distinct customer IDs coming to the store until each day. (e.g., on day 3, 5 distinct customers visited so far)

Here is a simple example of the raw table I have:

Here is what I would like to get:

 Day    Count(distinct ID) until that day
 1        2
 2        3
 3        5

Is there way to easily do this in a single query?

Juan Carlos Oropeza · Accepted Answer

Not 100% sure if will work on impala

But if you have a table days. Or if you have a way of create a derivated table on the fly on impala.

CREATE TABLE days ("DayC" int);

INSERT INTO days
    ("DayC")
VALUES  (1), (2), (3);

OR

 CREATE TABLE days AS
     SELECT DISTINCT "Day"
     FROM sales

You can use this query

SqlFiddleDemo in Postgresql

SELECT "DayC", COUNT(DISTINCT "ID")
FROM sales
cross JOIN days 
WHERE "Day" <= "DayC"
GROUP BY "DayC"

OUTPUT

| DayC | count |
|------|-------|
|    1 |     2 |
|    2 |     3 |
|    3 |     5 |

UPDATE VERSION

SELECT T."DayC", COUNT(DISTINCT "ID")
FROM sales
cross JOIN (SELECT DISTINCT "Day" as "DayC" FROM sales) T
WHERE "Day" <= T."DayC"
GROUP BY T."DayC"

How to get the count of distinct values until a time period Impala/SQL?

Answers (2)

Related Questions