vero
vero

Reputation: 1015

convert date to integer scala spark

I have a dataframe, that contain, 2 columns of date start_date and finish_date; and I created a new column to add the moyen between the 2 dates.

+-----+--------+-------+---------+-----+--------------------+-------------------
start_date|                      finish_date|                  moyen_date|
+-----+--------+-------+---------+-----+--------------------+-------------------
     2010-11-03 15:56:...      |2010-11-03 17:43:...|                 0|
    2010-11-03 17:43:...      |2010-11-05 13:21:...|                  2|
    2010-11-05 13:21:...      |2010-11-05 14:08:...|                  0|
    2010-11-05 14:08:...      |2010-11-05 14:08:...|                  0|
+-----+--------+-------+---------+-----+--------------------+-------------------

I calculated the difference between the 2 dates:

var result = sqlDF.withColumn("moyen_date",datediff(col("finish_date"), col("start_date")))

But I want to convert start_date and finish_date to integer, knowing that each column contain date + time.

Someone can help me please. ?

Thank you

Upvotes: 2

Views: 11971

Answers (1)

philantrovert
philantrovert

Reputation: 10082

Considering this as part of your dataframe:

df.show(false)
+---------------------+
|ts                   |
+---------------------+
|2010-11-03 15:56:34.0|
+---------------------+

unix_timestamp returns the number of milliseconds since epoch. The input column should be of type timestamp. The output column is of type long.

df.withColumn("unix_ts" , unix_timestamp($"ts").show(false)
+---------------------+----------+
|ts                   |unix_ts   |
+---------------------+----------+
|2010-11-03 15:56:34.0|1288817794|
+---------------------+----------+

To convert it back to timestamp format of your choice, you can use from_unixtime which also takes an optional timestamp format as a parameter. You are using to_date, that's why you're only getting the date and not the time.

df.withColumn("unix_ts" , unix_timestamp($"ts") )
  .withColumn("from_utime" , from_unixtime($"unix_ts" , "yyyy-MM-dd HH:mm:ss.S"))
  .show(false)
+---------------------+----------+---------------------+
|ts                   |unix_ts   |from_utime           |
+---------------------+----------+---------------------+
|2010-11-03 15:56:34.0|1288817794|2010-11-03 15:56:34.0|
+---------------------+----------+---------------------+

The column from_utime here will be of type string though. To convert it to timestamp, you can simple use:

df.withColumn("from_utime" , $"from_utime".cast("timestamp") )

Since it's already in ISO date format, no specific conversion is needed. For any other format, you will need to use a combination of unix_timestamp and from_unixtime.

Upvotes: 4

Related Questions