Reputation: 1530
I need to fetch Year, Month, Day, Hour
from strings in the column Time
in Spark df. I also need to save those values in separate columns. The data looks like this:
ID Time
111 2020-03-23-12:40:04
112 2020-04-23-12:40:04
113 2020-05-23-12:40:04
The desired output is:
ID Year Month Day
111 2020 03 23
112 2020 04 23
113 2020 05 23
I tried:
data_df.select(
year("Time").alias('year'),
month("Time").alias('month'),
dayofmonth("Time").alias('day')
).show()
It returned all Null values.
Upvotes: 1
Views: 613
Reputation: 10406
I assume that the Time
column is a string. You can either extract the values you want with regular expressions but that would be quite painful. The other option is to convert the string into a timestamp and then use the functions year
, month
etc.
Also, your timestamp is not in a standard format, so you need to specify it.
data = [(111, '2020-03-23-12:40:04'),
(112, '2020-04-23-12:40:04'),
(113, '2020-05-23-12:40:04')]
df = spark.createDataFrame(data, ['ID', 'Time'])
df\
.withColumn('t', F.to_timestamp('Time', 'yyyy-MM-dd-HH:mm:ss'))\
.select('ID',
F.year('t').alias('year'),
F.month('t').alias('month'),
F.dayofmonth('t').alias('day'),
F.hour('t').alias('hour')
).show()
which yields:
+---+----+-----+---+----+
| ID|year|month|day|hour|
+---+----+-----+---+----+
|111|2020| 3| 23| 12|
|112|2020| 4| 23| 12|
|113|2020| 5| 23| 12|
+---+----+-----+---+----+
Upvotes: 2