Reputation: 43
Sample Data:
customer txn_date tag
A 1-Jan-17 1
A 2-Jan-17 1
A 4-Jan-17 1
A 5-Jan-17 0
B 3-Jan-17 1
B 5-Jan-17 0
Need to fill every missing txn_date between date range (1-Jan-17 to 5-Jan-2017). Just like below:
Output should be:
customer txn_date tag
A 1-Jan-17 1
A 2-Jan-17 1
A 3-Jan-17 0 (inserted)
A 4-Jan-17 1
A 5-Jan-17 0
B 1-Jan-17 0 (inserted)
B 2-Jan-17 0 (inserted)
B 3-Jan-17 1
B 4-Jan-17 0 (inserted)
B 5-Jan-17 0
Upvotes: 2
Views: 1812
Reputation: 44941
select c.customer
,d.txn_date
,coalesce(t.tag,0) as tag
from (select date_add (from_date,i) as txn_date
from (select date '2017-01-01' as from_date
,date '2017-01-05' as to_date
) p
lateral view
posexplode(split(space(datediff(p.to_date,p.from_date)),' ')) pe as i,x
) d
cross join (select distinct
customer
from t
) c
left join t
on t.customer = c.customer
and t.txn_date = d.txn_date
;
c.customer d.txn_date tag
A 2017-01-01 1
A 2017-01-02 1
A 2017-01-03 0
A 2017-01-04 1
A 2017-01-05 0
B 2017-01-01 0
B 2017-01-02 0
B 2017-01-03 1
B 2017-01-04 0
B 2017-01-05 0
Upvotes: 6
Reputation: 2650
Just have the delta content i.e the missing data in a file(input.txt) delimited with the same delimiter you have mentioned when you created the table.
Then use the load data command to insert this records into the table.
load data local inpath '/tmp/input.txt' into table tablename;
Your data wont be in the order you have mentioned , it would get appended to the last. You could retrieve the order by adding order by txn_date in the select query.
Upvotes: 0