Reputation: 5234
I have the following sample dataframe. The date_1 and date_2 columns have datatype of timestamp.
ID date_1 date_2 date_diff
A 2019-01-09T01:25:00.000Z 2019-01-10T14:00:00.000Z -1
B 2019-01-12T02:18:00.000Z 2019-01-12T17:00:00.000Z 0
I want to find the different between date_1 and date_2 in minutes.
When I use the code below, it gives me the date_diff column in whole integer values (days):
df = df.withColumn("date_diff", F.datediff(F.col('date_1'), F.col('date_2')))
But what I want is for date_diff to take into consideration the timestamp and give me minutes back.
How do I do this?
Upvotes: 26
Views: 42554
Reputation: 43544
Just convert the timestamps to unix timestamps (seconds since epoch), compute the difference, and divide by 60.
For example:
import pyspark.sql.functions as F
df.withColumn(
"date_diff_min",
(F.col("date_1").cast("long") - F.col("date_2").cast("long"))/60.
).show(truncate=False)
Upvotes: 40