Pyspark: Convert Column from String Type to Timestamp Type

Question

I have been using pyspark 2.3. I have data frame containing 'TIME' column in String format for DateTime values. where the column looks like:

+---------------+
|           TIME|
+---------------+
| 2016/04/14 190|
| 2016/04/15 180|
|2016/04/14 1530|
|2016/04/16 1530|
| 2016/04/17 160|
+---------------+

where the first two digits in 190 or 1530 representing the hour and the rest is the minutes. I tried to convert it to a timestamp type using the following lines:

df.withColumn('TIME_timestamp',fn.unix_timestamp('TIME','yyyy/MM/dd HHMM').cast(TimestampType()))

and also :

df.withColumn('TIME_timestamp', fn.to_timestamp("TIME", 'yyyy/MM/dd HHMM'))

but the resulting is:

+---------------+-------------------+
|           TIME|     TIME_timestamp|
+---------------+-------------------+
| 2016/04/14 190|               null|
| 2016/04/15 180|               null|
|2016/04/14 1530|               null|
|2016/04/16 1530|               null|
| 2016/04/17 160|               null|
+---------------+-------------------+

So the desired Dataframe should look like:

+---------------+
| TIME_timestamp|
+---------------+
| 16-04-15 19:00|
| 16-04-15 18:00|
| 16-04-15 15:30|
| 16-04-15 15:30|
| 16-04-15 16:00|
+---------------+

Florian · Accepted Answer

You use capital M to identify the months as well as the minutes; the minutes should be identified with m, see here. A working example using to_timestamp is given below, hope this helps!

import pyspark.sql.functions as F

df = sqlContext.createDataFrame(
    [
     ('2016/04/14 190',),
     ('2016/04/15 180',),
     ('2016/04/14 1530',),
     ('2016/04/16 1530',),
     ('2016/04/17 160',)
    ],
    ("TIME",)
)

df.withColumn('TIME_timestamp',F.to_timestamp("TIME", "yyyy/MM/dd HHmm")).show()

Output:

+---------------+-------------------+
|           TIME|     TIME_timestamp|
+---------------+-------------------+
| 2016/04/14 190|2016-04-14 19:00:00|
| 2016/04/15 180|2016-04-15 18:00:00|
|2016/04/14 1530|2016-04-14 15:30:00|
|2016/04/16 1530|2016-04-16 15:30:00|
| 2016/04/17 160|2016-04-17 16:00:00|
+---------------+-------------------+

Pyspark: Convert Column from String Type to Timestamp Type

Answers (1)

Related Questions