Reputation: 347
If I have a table like this in Hive:
name impressions sampling_rate
------------------------------------
paul 34 1
emma 0 3
greg 0 5
How can I duplicate each row in a select statement by the sampling_rate
column so that it would look like this:
name impressions sampling_rate
------------------------------------
paul 34 1
emma 0 3
emma 0 3
emma 0 3
greg 0 5
greg 0 5
greg 0 5
greg 0 5
greg 0 5
Upvotes: 1
Views: 316
Reputation: 38325
Using space() you can produce a string of spaces with lenght=sampling_rate-1 , split it and explode with lateral view, it will duplicate rows.
Demo:
with your_table as(--Demo data, use your table instead of this CTE
select stack (3, --number of tuples
'paul',34,1,
'emma', 0,3,
'greg', 0,5
) as (name,impressions,sampling_rate)
)
select t.*
from your_table t --use your table here
lateral view explode(split(space(t.sampling_rate-1),' '))e
Result:
name impressions sampling_rate
------------------------------------
paul 34 1
emma 0 3
emma 0 3
emma 0 3
greg 0 5
greg 0 5
greg 0 5
greg 0 5
greg 0 5
Upvotes: 3