Reputation: 93
I have the same problem presented in the following question:
Forward Fill New Row to Account for Missing Dates
The difference is that I need to calculate a difference of hours between two different days, for example between 2019-12-26 22:00:00 and 2019-12-27 09:00:00 and the following function in this particular case fails because the range becomes negative:
def missing_hours(t1, t2):
return [t1 + relativedelta(hours=-x) for x in range (1, t1.hour-t2.hour)]
missing_hours_udf = udf(missing_hours, ArrayType(TimestampType()))
I tried to modify it in several ways (for example trying (t1-t2).hour) but I have always failed.
Does anyone know how to modify properly the above function to get the desired result?
Upvotes: 1
Views: 162
Reputation: 1932
Here is the updated function to take care of filling hours between two dates
def missing_hours(t1, t2):
diff = t1 - t2
days, seconds = diff.days, diff.seconds
hours = days * 24 + seconds // 3600
return [t1 + relativedelta(hours=-x) for x in range(1, hours)]
Upvotes: 1