Reputation: 18639
I have four string columns 'hour', 'day', 'month', 'year'
in my data frame. I would like to create new column fulldate in format 'dd/MM/yyyy HH:mm'
.
df2 = df1.withColumn("fulldate", to_date(concat(col('day'), lit('/'), col('month'), lit('/'), col('year'), lit(' '), col('hour'), lit(':'), lit('0'), lit('0')), 'dd/MM/yyyy HH:mm'))
but it doesn't seem to work. I'm getting format "yyyy-mm-dd".
Am I missing something?
Upvotes: 1
Views: 2749
Reputation: 32700
For Spark 3+, you can use make_timestamp
function to create a timestamp column from those columns and use date_format
to convert it to the desired date pattern :
from pyspark.sql import functions as F
df2 = df1.withColumn(
"fulldate",
F.date_format(
F.expr("make_timestamp(year, month, day, hour, 0, 0)"),
"dd/MM/yyyy HH:mm"
)
)
Upvotes: 2
Reputation: 42422
Use date_format
instead of to_date
.
to_date
converts a column to date type from the given format, while date_format
converts a date type column to the given format.
from pyspark.sql.functions import date_format, concat, col, lit
df2 = df1.withColumn(
"fulldate",
date_format(
concat(col('year'), lit('/'), col('month'), lit('/'), col('day'), lit(' '), col('hour'), lit(':'), lit('00'), lit(':'), lit('00')),
'dd/MM/yyyy HH:mm'
)
)
For better readability, you can use format_string
:
from pyspark.sql.functions import date_format, format_string, col
df2 = df1.withColumn(
"fulldate",
date_format(
format_string('%d/%d/%d %d:00:00', col('year'), col('month'), col('day'), col('hour')),
'dd/MM/yyyy HH:mm'
)
)
Upvotes: 1