A Saraf
A Saraf

Reputation: 275

Transposing data in Hive

I have a Hive table with data in the below format

day       class  start_time  count  kpi1  kpi2  kpi3  kpi4  ...  kpi160
-----------------------------------------------------------------------
20161010  abc    00          12     1     0     null  0     ...

I want to write a hive query to fetch the data in the format below using some calculations like max, min , and avg.

day       class  start_time  count  kpi_name  kpi_max   kpi_min   kpi_avg
-----------------------------------------------------------------------
20161010  abc    00          12     kpi1      max(kpi1) min(kpi1) avg(kpi1)
20161010  abc    00          12     kpi2      max(kpi2) min(kpi2) avg(kpi2)

Please suggest a solution to fetch the data in this format.

Thanks.

Upvotes: 1

Views: 591

Answers (2)

o-90
o-90

Reputation: 17585

You need to put all the kpis in a map, explode the map to create one column, and then aggregate.

Example:

Data:

+---------+------+-----------+-------+-----+-----+------+------+------+------+
|day_     |class |start_time |count_ |kpi0 |kpi1 | kpi2 | kpi3 | kpi4 | kpi5 |
+---------+------+-----------+-------+-----+-----+------+------+------+------+
|20161010 |abc   |00         |12     |1    |2    |3     |8     |9     |6     |
+---------+------+-----------+-------+-----+-----+------+------+------+------+
|20161010 |abc   |00         |12     |4    |5    |null  |6     |10    |null  |
+---------+------+-----------+-------+-----+-----+------+------+------+------+

Query:

SELECT day_
  , class
  , start_time
  , count_
  , kpi_type
  , MAX(vals) AS max_vals
  , MIN(vals) AS min_vals
  , AVG(vals) AS avg_vals
FROM (
  SELECT day_, class, start_time, count_, kpi_type, vals
  FROM database.table
  LATERAL VIEW EXPLODE(MAP('kpi0', kpi0
                         , 'kpi1', kpi1
                         , 'kpi2', kpi2
                         , 'kpi3', kpi3
                         , 'kpi4', kpi4
                         , 'kpi5', kpi5)) et AS kpi_type, vals ) x
GROUP BY day_, class_, start_time, count_, kpi_type

Output:

+---------+------+-----------+-------+---------+---------+---------+---------+
|day_     |class |start_time |count_ |kpi_type |max_vals |min_vals |avg_vals |
+---------+------+-----------+-------+---------+---------+---------+---------+
|20161010 |abc   |00         |12     |kpi0     |4        |1        |2.5      | 
+---------+------+-----------+-------+---------+---------+---------+---------+
|20161010 |abc   |00         |12     |kpi1     |5        |2        |3.5      |
+---------+------+-----------+-------+---------+---------+---------+---------+
|20161010 |abc   |00         |12     |kpi2     |3        |3        |3.0      |
+---------+------+-----------+-------+---------+---------+---------+---------+
|20161010 |abc   |00         |12     |kpi3     |8        |6        |7.0      |
+---------+------+-----------+-------+---------+---------+---------+---------+
|20161010 |abc   |00         |12     |kpi4     |10       |9        |9.5      |
+---------+------+-----------+-------+---------+---------+---------+---------+
|20161010 |abc   |00         |12     |kpi5     |6        |6        |6.0      |
+---------+------+-----------+-------+---------+---------+---------+---------+

Upvotes: 1

Arunakiran Nulu
Arunakiran Nulu

Reputation: 2089

If you want to get the min,max,avg , you have to specify the group By column , let's suppose you want to group by day.

SELECT day,
       class,
       start_time,
       count,
       kpi1,
       MAX(kpi1) as max_kpi1,
       MIN(kpi1) as min_kpi1,
       AVG(kpi1) as avg_kpi1

FROM table

GROUP BY day  

Upvotes: 0

Related Questions