SimbaPK
SimbaPK

Reputation: 596

spark scala split timestamp column in date column and time column

I have a problem with split of a timestamp column into Date and time columns. First the time doesn't consider 24h format ... Second the date is false and i don't understand why

here is my output

+----------+----------+-------------------+---------+
|      Date| Timestamp|               Time|EventTime|
+----------+----------+-------------------+---------+
|2018-00-30|1540857600|2018-10-30 00:00:00| 12:00:00|
|2018-00-30|1540857610|2018-10-30 00:00:10| 12:00:10|
|2018-00-30|1540857620|2018-10-30 00:00:20| 12:00:20|
|2018-00-30|1540857630|2018-10-30 00:00:30| 12:00:30|
|2018-00-30|1540857640|2018-10-30 00:00:40| 12:00:40|
|2018-00-30|1540857650|2018-10-30 00:00:50| 12:00:50|
|2018-01-30|1540857660|2018-10-30 00:01:00| 12:01:00|
|2018-01-30|1540857670|2018-10-30 00:01:10| 12:01:10|
|2018-01-30|1540857680|2018-10-30 00:01:20| 12:01:20|
|2018-01-30|1540857690|2018-10-30 00:01:30| 12:01:30|
|2018-01-30|1540857700|2018-10-30 00:01:40| 12:01:40|

and my code :

  val df = data_input
    .withColumn("Time", to_timestamp(from_unixtime(col("Timestamp"))))
    .withColumn("Date", date_format(col("Time"), "yyyy-mm-dd"))
    .withColumn("EventTime", date_format(col("Time"), "hh:mm:ss"))

first I convert the unix Timestamp column to Time column, and then i want to split Time..

Thank you in advance

Upvotes: 0

Views: 4083

Answers (2)

sramalingam24
sramalingam24

Reputation: 1337

You can avoid the confusion with simple casting

import org.apache.spark.sql.functions._

val df = data_input
    .withColumn("Time", $"Timestamp".cast("timestamp"))
    .withColumn("Date", $"Time".cast("date"))
    .withColumn("EventTime", date_format($"Time", "H:m:s"))

+----------+-------------------+----------+---------+
|Timestamp |               Time|      Date|EventTime|
+----------+-------------------+----------+---------+
|1540857600|2018-10-30 00:00:00|2018-10-30|    0:0:0|
|1540857610|2018-10-30 00:00:10|2018-10-30|   0:0:10|
|1540857620|2018-10-30 00:00:20|2018-10-30|   0:0:20|
|1540857630|2018-10-30 00:00:30|2018-10-30|   0:0:30|
|1540857640|2018-10-30 00:00:40|2018-10-30|   0:0:40|
|1540857650|2018-10-30 00:00:50|2018-10-30|   0:0:50|
|1540857660|2018-10-30 00:01:00|2018-10-30|    0:1:0|
|1540857670|2018-10-30 00:01:10|2018-10-30|   0:1:10|
|1540857680|2018-10-30 00:01:20|2018-10-30|   0:1:20|
|1540857690|2018-10-30 00:01:30|2018-10-30|   0:1:30|
|1540857700|2018-10-30 00:01:40|2018-10-30|   0:1:40|
+----------+-------------------+----------+---------+

Upvotes: 2

Ryan Widmaier
Ryan Widmaier

Reputation: 8523

You are using the wrong format codes. Specifically "mm" in your date is for minutes, and "hh" is for 12 hour values. Instead you want "MM" and "HH". Like this:

val df = data_input
    .withColumn("Time", to_timestamp(from_unixtime(col("Timestamp"))))
    .withColumn("Date", date_format(col("Time"), "yyyy-MM-dd"))
    .withColumn("EventTime", date_format(col("Time"), "HH:mm:ss"))

For reference, here are the date format codes that you can use: SimpleDateFormat

Upvotes: 3

Related Questions