milad ahmadi
milad ahmadi

Reputation: 545

What data type should be used for a time column

In my Spark appliction, I had to split the time and data and store them in separate column as follow:

val df5=df4.withColumn("read_date",date_format(df4.col("date"), "yyyy-MM-dd")).withColumn("read_time",date_format(df4.col("date"), "HH:mm:ss")).drop("date")

This command will split data and time

------------+-------------
2012-01-12     00:06:00
------------+-------------

but creates both fields as String. So, I have to .cast("date") for date, but what data type to use for time column? If I use like .cast("timestamp") it will combine the current server date to the time. As we are going to visualize the data in Power BI, do you think storing the time as String is right approach to do?

Upvotes: 4

Views: 7886

Answers (1)

Lakshman Battini
Lakshman Battini

Reputation: 1912

There is no DataType in Spark to hold 'HH:mm:ss' values. Instead you can use hour(), minute() and second() functions to represent the values respectively.

All these functions return int types.

hour(string date) -- Returns the hour of the timestamp: hour('2009-07-30 12:58:59') = 12, hour('12:58:59') = 12.

minute(string date) -- Returns the minute of the timestamp.

second(string date) -- Returns the second of the timestamp.

Upvotes: 3

Related Questions