Frau P
Frau P

Reputation: 144

How to extract time from timestamp in pyspark?

I have a requirement to extract time from timestamp(this is a column in dataframe) using pyspark. lets say this is the timestamp 2019-01-03T18:21:39 , I want to extract only time "18:21:39" such that it always appears in this manner "01:01:01"

df = spark.createDataFrame(["2020-06-17T00:44:30","2020-06-17T06:06:56","2020-06-17T15:04:34"],StringType()).toDF('datetime')

df=df.select(df['datetime'].cast(TimestampType()))

I tried like below but did not get the expected result

df1=df.withColumn('time',concat(hour(df['datetime']),lit(":"),minute(df['datetime']),lit(":"),second(df['datetime'])))

display(df1)
+-------------------+-------+
|           datetime|   time|
+-------------------+-------+
|2020-06-17 00:44:30|0:44:30|
|2020-06-17 06:06:56| 6:6:56|
|2020-06-17 15:04:34|15:4:34|
+-------------------+-------+

my results are like this 6:6:56 but i want them to be 06:06:56

Upvotes: 9

Views: 28709

Answers (1)

Lamanus
Lamanus

Reputation: 13581

Use the date_format function.

from pyspark.sql.types import StringType

df = spark \
  .createDataFrame(["2020-06-17T00:44:30","2020-06-17T06:06:56","2020-06-17T15:04:34"], StringType()) \
  .toDF('datetime')

from pyspark.sql.functions import date_format
q = df.withColumn('time', date_format('datetime', 'HH:mm:ss'))

>>> q.show()
+-------------------+--------+
|           datetime|    time|
+-------------------+--------+
|2020-06-17T00:44:30|00:44:30|
|2020-06-17T06:06:56|06:06:56|
|2020-06-17T15:04:34|15:04:34|
+-------------------+--------+

Upvotes: 15

Related Questions