How to duplicate row based on int column

Question

If I have a table like this in Hive:

name     impressions   sampling_rate
------------------------------------
paul        34              1
emma         0              3
greg         0              5

How can I duplicate each row in a select statement by the sampling_rate column so that it would look like this:

name     impressions   sampling_rate
------------------------------------
paul        34              1
emma         0              3
emma         0              3
emma         0              3
greg         0              5
greg         0              5
greg         0              5
greg         0              5
greg         0              5

leftjoin · Accepted Answer

Using space() you can produce a string of spaces with lenght=sampling_rate-1 , split it and explode with lateral view, it will duplicate rows.

Demo:

with your_table as(--Demo data, use your table instead of this CTE
select stack (3, --number of tuples
'paul',34,1,
'emma', 0,3,
'greg', 0,5
) as (name,impressions,sampling_rate)
)

select t.*
  from your_table t --use your table here
       lateral view explode(split(space(t.sampling_rate-1),' '))e

Result:

name     impressions   sampling_rate
------------------------------------
paul        34              1
emma         0              3
emma         0              3
emma         0              3
greg         0              5
greg         0              5
greg         0              5
greg         0              5
greg         0              5

How to duplicate row based on int column

Answers (1)

Related Questions