Axis
Axis

Reputation: 2132

Redshift, Find the level distribution of each items (Total number of item per level)

The question is updated!!

I have a database that stores customer item upgrade. I have DateTime and upgraded level. When customer upgrade item in day 1 and again on day 5, I cannot see any information between Day1-Day5

Example data: (Upgrade table)

 day              customer    items        levels
01/01/2019           a          item1          0
01/01/2019           b          item1          0
02/01/2019           a          item1          1
03/01/2019           b          item1          1
04/01/2019           a          item1          2
05/01/2019           b          item1          2
05/01/2019           c          item1          0
06/01/2019           b          item1          3
07/01/2019           d          item2          0
08/01/2019           NULL       NULL          NULL
09/01/2019           b          item1          4
10/01/2019           NULL       NULL          NULL
11/10/2019           b          item1         5
11/10/2019           a          item1         3
11/10/2019           a          item1         4

Example data: (daily table for a user)

 day              time-spent      
01/01/2019           11       
02/01/2019           34        
03/01/2019           56        
04/01/2019           78       
11/01/2019           3         
14/01/2019           13
22/01/2019           30        

My challange is find the total number of item for in each level so How many Level 0 item-1 I have per day:

01/01/2019     2            
02/01/2019     1      
03/01/2019     0     
04/01/2019     0      
05/01/2019     1             
06/01/2019     1     
07/01/2019     1      
08/01/2019     1     
09/01/2019     1      
10/01/2019     1      
11/10/2019     1      

How many Level 1 item-1 I have per day?

01/01/2019     0            
02/01/2019     1      
03/01/2019     2     
04/01/2019     1      
05/01/2019     0             
06/01/2019     0     
07/01/2019     0      
08/01/2019     0     
09/01/2019     0      
10/01/2019     0      
11/10/2019     0  

for all twenty levels per item! (I need to find the level of item distributions!)

What I tried this:

    SELECT *
    FROM (
           SELECT date(timestamp) day,
                  user_id,
                  item_id,
                  level,
-- Left join the daily table and get last value of an item per customers
                  last_value(level) IGNORE NULLS OVER (PARTITION BY user_id ORDER BY day
                    ROWS UNBOUNDED PRECEDING ) level_r
           FROM daily d -- I have a calendar dates from and all customers in this table 
                  LEFT JOIN (
               SELECT user_id,
                      date(TIMESTAMP) t,
                      item_id,
                      level,
                      RANK()
                      OVER (PARTITION BY user_id,date(timestamp), item_id ORDER BY TIMESTAMP DESC ) transaction_rank
               FROM updated 
               WHERE item_id
             ) u ON u.user_id = d.user_id AND u.t = d.day AND transaction_rank = 1
           GROUP BY
             1, 2, 3, 4)

but the problem is, it works if I have one customer because of null values bu if you have multiple customers it does not work.

How can I found the level distribution for items?

EDIT: I wanted to add some more information to make clearer. If no one changes their item level the previous day I need to know the day after as well.

I need to know how many levels of an item I have from any chosen date to the previous whole time

EDIT-2: I do not need to join the upgrade table to daily. I need to see al values of upgrade table result in generate_series type but when I create generate seres I have an eror

Upvotes: 2

Views: 309

Answers (3)

GMB
GMB

Reputation: 222482

Let's start by building a base query, that gives us the current level of each customer on each and every day.

This works by recovering all available days in the table, then all prior records to each day, and finally aggregating to compute the current level of each (customer, item) tuple.

The underlying assumption is that that the level of a tuple cannot decrease (this is consistent with your sample data).

select
    d.day,
    t.customer, 
    t.items,
    max(t.levels) max_level
from 
    (select distinct day from mytable) d
    left join mytable t on t.day <= d.day
group by 
    d.day, 
    t.customer, 
    t.items

From there on, it's quite easy to generate the required output, by adding another level of (conditional) aggregation.

This query gives you the distribution of customers of item1 per date across levels 1 to 5:

select 
    day,
    sum( (max_level = 0)::int ) level0,
    sum( (max_level = 1)::int ) level1,
    sum( (max_level = 2)::int ) level2,
    sum( (max_level = 2)::int ) level3,
    sum( (max_level = 4)::int ) level4,
    sum( (max_level = 5)::int ) level5
from (
    select
        d.day,
        t.customer, 
        t.items,
        max(t.levels) max_level
    from 
        (select distinct day from mytable) d
        left join mytable t on t.day <= d.day
    group by 
        d.day, 
        t.customer, 
        t.items
) x
where items = 'item1'
group by day
order by day

Yields:

| day                      | level0 | level1 | level2 | level3 | level4 | level5 |
| ------------------------ | ------ | ------ | ------ | ------ | ------ | ------ |
| 2019-01-01T00:00:00.000Z | 2      | 0      | 0      | 0      | 0      | 0      |
| 2019-02-01T00:00:00.000Z | 1      | 1      | 0      | 0      | 0      | 0      |
| 2019-03-01T00:00:00.000Z | 0      | 2      | 0      | 0      | 0      | 0      |
| 2019-04-01T00:00:00.000Z | 0      | 1      | 1      | 1      | 0      | 0      |
| 2019-05-01T00:00:00.000Z | 1      | 0      | 2      | 2      | 0      | 0      |
| 2019-06-01T00:00:00.000Z | 1      | 0      | 1      | 1      | 0      | 0      |
| 2019-07-01T00:00:00.000Z | 1      | 0      | 1      | 1      | 0      | 0      |
| 2019-08-01T00:00:00.000Z | 1      | 0      | 1      | 1      | 0      | 0      |
| 2019-09-01T00:00:00.000Z | 1      | 0      | 1      | 1      | 1      | 0      |
| 2019-10-01T00:00:00.000Z | 1      | 0      | 1      | 1      | 1      | 0      |
| 2019-11-10T00:00:00.000Z | 1      | 0      | 0      | 0      | 1      | 1      |

Demo on DB Fiddle

Upvotes: 0

Himanshu
Himanshu

Reputation: 3970

Is this what you want

Select * from 
(Select Day,items,levels,count(*) from dataa group by 
               day,items,levels )
 pivot (count(*) for levels in (0,1,2,3,4,5)) order by day,items;

enter image description here

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269873

Assuming you have at least one row per day, then this seems like aggregation:

select timestamp::date,
       sum( (level = 1)::int ) as level_1,
       sum( (level = 2)::int ) as level_2,
       . . . 
       sum( (level = 20)::int ) as level_20
from daily
group by timestamp::date
order by timestamp::date;

Upvotes: 1

Related Questions