Reputation: 596
I have a problem with split of a timestamp column into Date and time columns. First the time doesn't consider 24h format ... Second the date is false and i don't understand why
here is my output
+----------+----------+-------------------+---------+
| Date| Timestamp| Time|EventTime|
+----------+----------+-------------------+---------+
|2018-00-30|1540857600|2018-10-30 00:00:00| 12:00:00|
|2018-00-30|1540857610|2018-10-30 00:00:10| 12:00:10|
|2018-00-30|1540857620|2018-10-30 00:00:20| 12:00:20|
|2018-00-30|1540857630|2018-10-30 00:00:30| 12:00:30|
|2018-00-30|1540857640|2018-10-30 00:00:40| 12:00:40|
|2018-00-30|1540857650|2018-10-30 00:00:50| 12:00:50|
|2018-01-30|1540857660|2018-10-30 00:01:00| 12:01:00|
|2018-01-30|1540857670|2018-10-30 00:01:10| 12:01:10|
|2018-01-30|1540857680|2018-10-30 00:01:20| 12:01:20|
|2018-01-30|1540857690|2018-10-30 00:01:30| 12:01:30|
|2018-01-30|1540857700|2018-10-30 00:01:40| 12:01:40|
and my code :
val df = data_input
.withColumn("Time", to_timestamp(from_unixtime(col("Timestamp"))))
.withColumn("Date", date_format(col("Time"), "yyyy-mm-dd"))
.withColumn("EventTime", date_format(col("Time"), "hh:mm:ss"))
first I convert the unix Timestamp column to Time column, and then i want to split Time..
Thank you in advance
Upvotes: 0
Views: 4083
Reputation: 1337
You can avoid the confusion with simple casting
import org.apache.spark.sql.functions._
val df = data_input
.withColumn("Time", $"Timestamp".cast("timestamp"))
.withColumn("Date", $"Time".cast("date"))
.withColumn("EventTime", date_format($"Time", "H:m:s"))
+----------+-------------------+----------+---------+
|Timestamp | Time| Date|EventTime|
+----------+-------------------+----------+---------+
|1540857600|2018-10-30 00:00:00|2018-10-30| 0:0:0|
|1540857610|2018-10-30 00:00:10|2018-10-30| 0:0:10|
|1540857620|2018-10-30 00:00:20|2018-10-30| 0:0:20|
|1540857630|2018-10-30 00:00:30|2018-10-30| 0:0:30|
|1540857640|2018-10-30 00:00:40|2018-10-30| 0:0:40|
|1540857650|2018-10-30 00:00:50|2018-10-30| 0:0:50|
|1540857660|2018-10-30 00:01:00|2018-10-30| 0:1:0|
|1540857670|2018-10-30 00:01:10|2018-10-30| 0:1:10|
|1540857680|2018-10-30 00:01:20|2018-10-30| 0:1:20|
|1540857690|2018-10-30 00:01:30|2018-10-30| 0:1:30|
|1540857700|2018-10-30 00:01:40|2018-10-30| 0:1:40|
+----------+-------------------+----------+---------+
Upvotes: 2
Reputation: 8523
You are using the wrong format codes. Specifically "mm" in your date is for minutes, and "hh" is for 12 hour values. Instead you want "MM" and "HH". Like this:
val df = data_input
.withColumn("Time", to_timestamp(from_unixtime(col("Timestamp"))))
.withColumn("Date", date_format(col("Time"), "yyyy-MM-dd"))
.withColumn("EventTime", date_format(col("Time"), "HH:mm:ss"))
For reference, here are the date format codes that you can use: SimpleDateFormat
Upvotes: 3