WorkBench
WorkBench

Reputation: 93

Filling in missing dates of a previous day in PySpark

I have the same problem presented in the following question:

Forward Fill New Row to Account for Missing Dates

The difference is that I need to calculate a difference of hours between two different days, for example between 2019-12-26 22:00:00 and 2019-12-27 09:00:00 and the following function in this particular case fails because the range becomes negative:


def missing_hours(t1, t2):
    return [t1 + relativedelta(hours=-x) for x in range (1, t1.hour-t2.hour)]

missing_hours_udf = udf(missing_hours, ArrayType(TimestampType()))

I tried to modify it in several ways (for example trying (t1-t2).hour) but I have always failed.

Does anyone know how to modify properly the above function to get the desired result?

Upvotes: 1

Views: 162

Answers (1)

Ranga Vure
Ranga Vure

Reputation: 1932

Here is the updated function to take care of filling hours between two dates

def missing_hours(t1, t2):
    diff = t1 - t2
    days, seconds = diff.days, diff.seconds
    hours = days * 24 + seconds // 3600

    return [t1 + relativedelta(hours=-x) for x in range(1, hours)]

Upvotes: 1

Related Questions