Reputation: 1108
I am having a dataframe with one of its column as epochtime. I want to extract only hour from it and display it as a separate column.
Below is the sample dataframe:
+----------+-------------+
| NUM_ID| STIME|
+----------+-------------+
|xxxxxxxx01|1571634285000|
|xxxxxxxx01|1571634299000|
|xxxxxxxx01|1571634311000|
|xxxxxxxx01|1571634316000|
|xxxxxxxx02|1571634318000|
|xxxxxxxx02|1571398176000|
|xxxxxxxx02|1571627596000|
Below is the expected output.
+----------+-------------+-----+
| NUM_ID| STIME| HOUR|
+----------+-------------+-----+
|xxxxxxxx01|1571634285000| 10 |
|xxxxxxxx01|1571634299000| 10 |
|xxxxxxxx01|1571634311000| 10 |
|xxxxxxxx01|1571634316000| 10 |
|xxxxxxxx02|1571634318000| 10 |
|xxxxxxxx02|1571398176000| 16 |
|xxxxxxxx02|1571627596000| 08 |
I have tried
val test = test1DF.withColumn("TIME", extract HOUR(from_unixtime($"STIME"/1000)))
which throws exception at
<console>:46: error: not found: value extract
Tried as below to obtain date format and even it is not working.
val test = test1DF.withColumn("TIME", to_timestamp(from_unixtime(col("STIME")))
The datatype of STIME in dataframe is Long.
Any leads to extract hour from epochtime in Long datatype?
Upvotes: 0
Views: 641
Reputation: 74365
Extracting the hours from a timestamp is as simple as using the hour()
function:
import org.apache.spark.sql.functions._
val df_with_hour = df.withColumn("TIME", hour(from_unixtime($"STIME" / 1000)))
df_with_hour.show()
// +-------------+----+
// | STIME|TIME|
// +-------------+----+
// |1571634285000| 5|
// |1571398176000| 11|
// |1571627596000| 3|
// +-------------+----+
(Note: I'm in a different timezone)
Upvotes: 1