BENOTH7
BENOTH7

Reputation: 37

Converting column data type from string to date with PySpark returns null values

I was trying to change the datatype of a column (Disponibility) from string type to date, but every time it shows this column converted as null values (for example: 23/01/2022 becomes null ....)

This is my code:

dfwdate = dfworkers2.withColumn("Disponibility", to_date("Disponibility")) \
.show(truncate=False)   

Upvotes: 1

Views: 1909

Answers (2)

ZygD
ZygD

Reputation: 24478

to_date('Disponibility', 'dd/MM/yyyy')

You have correctly chosen the function to_date. It's just that in your case it requires another parameter - the actual format of your date string 'dd-MM-yyyy'. By default, to_date expects to get the format 'yyyy-MM-dd'. Since your column is not of this format, you get null returned.

Full example:

from pyspark.sql import functions as F
df = spark.createDataFrame([('23/01/2022',)], ['Disponibility'])

df.show()
# +-------------+
# |Disponibility|
# +-------------+
# |   23/01/2022|
# +-------------+
print(df.dtypes)
# [('Disponibility', 'string')]

df = df.withColumn('Disponibility', F.to_date('Disponibility', 'dd/MM/yyyy'))

df.show()
# +-------------+
# |Disponibility|
# +-------------+
# |   2022-01-23|
# +-------------+
print(df.dtypes)
# [('Disponibility', 'date')]

Upvotes: 1

Vaebhav
Vaebhav

Reputation: 5052

You additionally need to supply the date_format with to_date , various format references can be found Spark date pattern documentation page

Date Conversion Examples

s = StringIO("""
date_str
2022-03-01
2022-05-20
2022-06-21
2022-10-22
""")

df = pd.read_csv(s,delimiter=',')

sparkDF = sql.createDataFrame(df)\
             .withColumn('date_parsed',F.to_date(F.col('date_str'), 'yyyy-MM-dd'))\
             .drop('date_str')

sparkDF.show()

+-----------+
|date_parsed|
+-----------+
| 2022-03-01|
| 2022-05-20|
| 2022-06-21|
| 2022-10-22|
+-----------+

sparkDF.printSchema()

root
 |-- date_parsed: date (nullable = true)

Upvotes: 2

Related Questions