Reputation: 23
I am using Hive, and need to calculate the difference (in days) between consecutive dates stored in an array contained in each row of a table, in order to get the gaps between recorded times. Each row is for one customer, and contains their transaction dates. For example (last column is the desired output):
customer_id | dates |output
--------------------------------------------------------------------------
0001 | ["2016-09-01","2017-01-01","2017-02-05","2017-11-01"]|[122,35,269]
Aim is to iterate through all rows in the table producing this new column. Customers will have different numbers of transactions, so I will need to loop through the list of dates.
Upvotes: 0
Views: 615
Reputation: 1584
Assume the input table is array_test
and output table is output_table
. Also, the array_test contains columns customer_id string
and dates Array<string>
The data i inserted in input table is:
insert into array_test select "0001",ARRAY("2016-09-01","2017-01-01","2017-02-05","2017-11-01")
insert into array_test select "0001",ARRAY("2016-09-01","2017-01-01","2017-02-05","2017-11-02")
The output table create statement i used is:
CREATE TABLE output_table(customer_id string,dates array<string>,output array<int>);
Then use the below query to select from input table and insert into output table:
insert into output_table select customer_id,dates, ARRAY(datediff(to_date(dates[1]), to_date(dates[0])),datediff(to_date(dates[2]), to_date(dates[1])),datediff(to_date(dates[3]), to_date(dates[2]))) from array_test;
Below is the output:
hive> select output from output_table;
OK
[122,35,269]
[122,35,269]
[122,35,270]
[122,35,270]
Time taken: 0.071 seconds, Fetched: 4 row(s)
Upvotes: 1