KTY
KTY

Reputation: 719

How to get the count of distinct values until a time period Impala/SQL?

I have a raw table recording customer ids coming to a store over a particular time period. Using Impala, I would like to calculate the number of distinct customer IDs coming to the store until each day. (e.g., on day 3, 5 distinct customers visited so far)

Here is a simple example of the raw table I have:

 Day    ID
 1    1234
 1    5631
 1    1234
 2    1234
 2    4456
 2    5631
 3    3482
 3    3452
 3    1234
 3    5631
 3    1234

Here is what I would like to get:

 Day    Count(distinct ID) until that day
 1        2
 2        3
 3        5

Is there way to easily do this in a single query?

Upvotes: 1

Views: 1381

Answers (2)

Juan Carlos Oropeza
Juan Carlos Oropeza

Reputation: 48207

Not 100% sure if will work on impala

But if you have a table days. Or if you have a way of create a derivated table on the fly on impala.

CREATE TABLE days ("DayC" int);

INSERT INTO days
    ("DayC")
VALUES  (1), (2), (3);

OR

 CREATE TABLE days AS
     SELECT DISTINCT "Day"
     FROM sales

You can use this query

SqlFiddleDemo in Postgresql

SELECT "DayC", COUNT(DISTINCT "ID")
FROM sales
cross JOIN days 
WHERE "Day" <= "DayC"
GROUP BY "DayC"

OUTPUT

| DayC | count |
|------|-------|
|    1 |     2 |
|    2 |     3 |
|    3 |     5 |

UPDATE VERSION

SELECT T."DayC", COUNT(DISTINCT "ID")
FROM sales
cross JOIN (SELECT DISTINCT "Day" as "DayC" FROM sales) T
WHERE "Day" <= T."DayC"
GROUP BY T."DayC"

Upvotes: 3

Vance
Vance

Reputation: 897

try this one:

 select day, count(distinct(id)) from yourtable group by day

Upvotes: -1

Related Questions