PineNuts0
PineNuts0

Reputation: 5234

PySpark: Subtract Two Timestamp Columns and Give Back Difference in Minutes (Using F.datediff gives back only whole days)

I have the following sample dataframe. The date_1 and date_2 columns have datatype of timestamp.

ID  date_1                      date_2                      date_diff
A   2019-01-09T01:25:00.000Z    2019-01-10T14:00:00.000Z    -1
B   2019-01-12T02:18:00.000Z    2019-01-12T17:00:00.000Z    0

I want to find the different between date_1 and date_2 in minutes.

When I use the code below, it gives me the date_diff column in whole integer values (days):

df = df.withColumn("date_diff", F.datediff(F.col('date_1'), F.col('date_2')))  

But what I want is for date_diff to take into consideration the timestamp and give me minutes back.

How do I do this?

Upvotes: 26

Views: 42554

Answers (1)

pault
pault

Reputation: 43544

Just convert the timestamps to unix timestamps (seconds since epoch), compute the difference, and divide by 60.

For example:

import pyspark.sql.functions as F
df.withColumn(
    "date_diff_min", 
    (F.col("date_1").cast("long") - F.col("date_2").cast("long"))/60.
).show(truncate=False)

Upvotes: 40

Related Questions