Reputation: 1015
I have a dataframe, that contain, 2 columns of date start_date
and finish_date
; and I created a new column to add the moyen between the 2 dates.
+-----+--------+-------+---------+-----+--------------------+-------------------
start_date| finish_date| moyen_date|
+-----+--------+-------+---------+-----+--------------------+-------------------
2010-11-03 15:56:... |2010-11-03 17:43:...| 0|
2010-11-03 17:43:... |2010-11-05 13:21:...| 2|
2010-11-05 13:21:... |2010-11-05 14:08:...| 0|
2010-11-05 14:08:... |2010-11-05 14:08:...| 0|
+-----+--------+-------+---------+-----+--------------------+-------------------
I calculated the difference between the 2 dates:
var result = sqlDF.withColumn("moyen_date",datediff(col("finish_date"), col("start_date")))
But I want to convert start_date and finish_date to integer, knowing that each column contain date + time
.
Someone can help me please. ?
Thank you
Upvotes: 2
Views: 11971
Reputation: 10082
Considering this as part of your dataframe:
df.show(false)
+---------------------+
|ts |
+---------------------+
|2010-11-03 15:56:34.0|
+---------------------+
unix_timestamp
returns the number of milliseconds since epoch. The input column should be of type timestamp
. The output column is of type long
.
df.withColumn("unix_ts" , unix_timestamp($"ts").show(false)
+---------------------+----------+
|ts |unix_ts |
+---------------------+----------+
|2010-11-03 15:56:34.0|1288817794|
+---------------------+----------+
To convert it back to timestamp format of your choice, you can use from_unixtime
which also takes an optional timestamp format as a parameter. You are using to_date
, that's why you're only getting the date and not the time.
df.withColumn("unix_ts" , unix_timestamp($"ts") )
.withColumn("from_utime" , from_unixtime($"unix_ts" , "yyyy-MM-dd HH:mm:ss.S"))
.show(false)
+---------------------+----------+---------------------+
|ts |unix_ts |from_utime |
+---------------------+----------+---------------------+
|2010-11-03 15:56:34.0|1288817794|2010-11-03 15:56:34.0|
+---------------------+----------+---------------------+
The column from_utime
here will be of type string
though. To convert it to timestamp
, you can simple use:
df.withColumn("from_utime" , $"from_utime".cast("timestamp") )
Since it's already in ISO date format, no specific conversion is needed. For any other format, you will need to use a combination of unix_timestamp
and from_unixtime
.
Upvotes: 4