Gurupraveen
Gurupraveen

Reputation: 181

Hive + loop through query results to build new table

I have table that contains sales data for each day like below:

Date              ItemNo     quantity
-------------------------------------
2017-11-25         123         2
2017-11-25         326         4
2017-11-26         123 .       3
2017-11-27         123         1
2017-11-28         123         5
2017-11-28 .       326 .       1
2017-11-29 .       123         9
2017-11-30 .       123         7

I have restricted only to 2 items, now with this table I want to build weekly aggregated table like below. Don't worry about week column, I have another logic to get that column:

Weekly aggregated Table:

Week   Item    Total Quantity   Sat Sun Mon Wed Tue Thu Fri 
4       123      27              3   1 . 5  9 .  7   0   2

Upvotes: 2

Views: 2615

Answers (2)

Bala
Bala

Reputation: 11244

Here is another approach

select * from d1;
+-------------+----------+---------+--+
|  d1.bdate   | d1.item  | d1.qty  |
+-------------+----------+---------+--+
| 2017-11-25  | 123      | 2       |
| 2017-11-25  | 123      | 2       |
| 2017-11-25  | 326      | 4       |
| 2017-11-26  | 123      | 3       |
| 2017-11-27  | 123      | 1       |
| 2017-11-28  | 123      | 5       |
| 2017-11-28  | 326      | 1       |
| 2017-11-29  | 123      | 9       |
| 2017-11-30  | 123      | 7       |
+-------------+----------+---------+--+

select  weekofyear(bdate) as week, sum(qty) as total,
        item, from_unixtime(unix_timestamp(bdate,'yyyy-MM-dd'),'EEE') as wday,
        map(from_unixtime(unix_timestamp(bdate,'yyyy-MM-dd'),'EEE'), sum(qty)) as gmap
from d1
group by weekofyear(bdate), item, 
from_unixtime(unix_timestamp(bdate,'yyyy-MM-dd'),'EEE')

 +-------+--------+-------+-------+------------+--+
| week  | total  | item  | wday  |    gmap    |
+-------+--------+-------+-------+------------+--+
| 47    | 4      | 123   | Sat   | {"Sat":4}  |
| 47    | 3      | 123   | Sun   | {"Sun":3}  |
| 47    | 4      | 326   | Sat   | {"Sat":4}  |
| 48    | 1      | 123   | Mon   | {"Mon":1}  |
| 48    | 7      | 123   | Thu   | {"Thu":7}  |
| 48    | 5      | 123   | Tue   | {"Tue":5}  |
| 48    | 9      | 123   | Wed   | {"Wed":9}  |
| 48    | 1      | 326   | Tue   | {"Tue":1}  |
+-------+--------+-------+-------+------------+--+

select week, item, sum(total) as totals,
        collect_list(s.gmap['Sun']) as Sun,
        collect_list(s.gmap['Mon']) as Mon,
        collect_list(s.gmap['Tue']) as Tue,
        collect_list(s.gmap['Wed']) as Wed,
        collect_list(s.gmap['Thu']) as Thu,
        collect_list(s.gmap['Fri']) as Fri,
        collect_list(s.gmap['Sat']) as Sat
        from    
        (select  weekofyear(bdate) as week, sum(qty) as total,
            item, from_unixtime(unix_timestamp(bdate,'yyyy-MM-dd'),'EEE') as wday,
            map(from_unixtime(unix_timestamp(bdate,'yyyy-MM-dd'),'EEE'), sum(qty)) as gmap
            from d1
            group by weekofyear(bdate), item, from_unixtime(unix_timestamp(bdate,'yyyy-MM-dd'),'EEE')

        ) s
        group by week, item

+-------+-------+---------+------+------+------+------+------+------+------+--+
| week  | item  | totals  | sun  | mon  | tue  | wed  | thu  | fri  | sat  |
+-------+-------+---------+------+------+------+------+------+------+------+--+
| 47    | 123   | 7       | [3]  | []   | []   | []   | []   | []   | [4]  |
| 47    | 326   | 4       | []   | []   | []   | []   | []   | []   | [4]  |
| 48    | 123   | 22      | []   | [1]  | [5]  | [9]  | [7]  | []   | []   |
| 48    | 326   | 1       | []   | []   | [1]  | []   | []   | []   | []   |
+-------+-------+---------+------+------+------+------+------+------+------+--+

Upvotes: 0

leftjoin
leftjoin

Reputation: 38325

Use CASE to calculate Day columns with aggregation (max or min) to get single row for each Item:

 select 
  Item,
  max(total_quantity) total_quantity, 
  max(case when date_format(your_date,'EEE') = 'Sat' then quantity end) Sat,
  max(case when date_format(your_date,'EEE') = 'Sun' then quantity end) Sun,
  max(case when date_format(your_date,'EEE') = 'Mon' then quantity end) Mon,
  max(case when date_format(your_date,'EEE') = 'Wed' then quantity end) Wed,
  max(case when date_format(your_date,'EEE') = 'Tue' then quantity end) Tue,
  max(case when date_format(your_date,'EEE') = 'Thu' then quantity end) Thu,
  max(case when date_format(your_date,'EEE') = 'Fri' then quantity end) Fri
from (--subquery for calculating total_quantity by Item
      select s.*,
             sum(quantity) over(partition by Item) total_quantity
        from sales_data
     )s
group by Item;

If it can be more than one record per date, item and you need to summarize them, then use sum() instead of max() for Day columns.

Upvotes: 1

Related Questions