Reputation: 643
I have been using pyspark 2.3. I have data frame containing 'TIME' column in String format for DateTime values. where the column looks like:
+---------------+
| TIME|
+---------------+
| 2016/04/14 190|
| 2016/04/15 180|
|2016/04/14 1530|
|2016/04/16 1530|
| 2016/04/17 160|
+---------------+
where the first two digits in 190
or 1530
representing the hour and the rest is the minutes.
I tried to convert it to a timestamp type using the following lines:
df.withColumn('TIME_timestamp',fn.unix_timestamp('TIME','yyyy/MM/dd HHMM').cast(TimestampType()))
and also :
df.withColumn('TIME_timestamp', fn.to_timestamp("TIME", 'yyyy/MM/dd HHMM'))
but the resulting is:
+---------------+-------------------+
| TIME| TIME_timestamp|
+---------------+-------------------+
| 2016/04/14 190| null|
| 2016/04/15 180| null|
|2016/04/14 1530| null|
|2016/04/16 1530| null|
| 2016/04/17 160| null|
+---------------+-------------------+
So the desired Dataframe should look like:
+---------------+
| TIME_timestamp|
+---------------+
| 16-04-15 19:00|
| 16-04-15 18:00|
| 16-04-15 15:30|
| 16-04-15 15:30|
| 16-04-15 16:00|
+---------------+
Upvotes: 3
Views: 15098
Reputation: 25425
You use capital M
to identify the months as well as the minutes; the minutes should be identified with m
, see here. A working example using to_timestamp
is given below, hope this helps!
import pyspark.sql.functions as F
df = sqlContext.createDataFrame(
[
('2016/04/14 190',),
('2016/04/15 180',),
('2016/04/14 1530',),
('2016/04/16 1530',),
('2016/04/17 160',)
],
("TIME",)
)
df.withColumn('TIME_timestamp',F.to_timestamp("TIME", "yyyy/MM/dd HHmm")).show()
Output:
+---------------+-------------------+
| TIME| TIME_timestamp|
+---------------+-------------------+
| 2016/04/14 190|2016-04-14 19:00:00|
| 2016/04/15 180|2016-04-15 18:00:00|
|2016/04/14 1530|2016-04-14 15:30:00|
|2016/04/16 1530|2016-04-16 15:30:00|
| 2016/04/17 160|2016-04-17 16:00:00|
+---------------+-------------------+
Upvotes: 5