Reputation: 239
I have a pretty simple question where I couldn't find a simple answer for: I want to convert the type of a column in my Pyspark DataFrame from a String to a Date, how can I do this?
I tried the following:
df.withColumn('dates', datetime.strpdate(col('date'), %Y%m%d))
and
df.withColumn('dates', datetime.strpdate(df.date, %Y%m%d))
but in each case I get the following error: TypeError: strptime() argument 1 must be str, not Column.
So obviously the col('date')
and df.date
are interpreted as a Column rather than the string that is holds. How can I fix this?
Upvotes: 1
Views: 1806
Reputation: 3110
You can convert the string column to date using "cast" function if the format is "yyyy-MM-dd" or you can use "to_date" function which is more generalized function where you can specify the input format as well.
Here is the sample code.
# Create DaraFrame
df = spark.createDataFrame([(1, "2020-06-03", "2020/06/03"), (2, "2020-05-01", "2020/05/01")] , ["id", "date_fmt_1", "date_fmt_2"])
# Convert the string columne to date.
df1 = (df
# approach - 1: use cast function if the format is "yyyy-MM-dd"
.withColumn("date_1", df["date_fmt_1"].cast("date"))
# Approach - 2 : use to_date function and specify the input format. "yyyy/MM/dd" in our case
.withColumn("date_2", to_date("date_fmt_2", "yyyy/MM/dd"))
# If you don't specify any format, it will take spark default format "yyyy-MM-dd"
.withColumn("date_3", to_date(df["date_fmt_1"])))
# Print the schema
df1.printSchema()
root
|-- id: long (nullable = true)
|-- date_fmt_1: string (nullable = true)
|-- date_fmt_2: string (nullable = true)
|-- date_1: date (nullable = true)
|-- date_2: date (nullable = true)
|-- date_3: date (nullable = true)
df1.show()
+---+----------+----------+----------+----------+----------+
| id|date_fmt_1|date_fmt_2| date_1| date_2| date_3|
+---+----------+----------+----------+----------+----------+
| 1|2020-06-03|2020/06/03|2020-06-03|2020-06-03|2020-06-03|
| 2|2020-05-01|2020/05/01|2020-05-01|2020-05-01|2020-05-01|
+---+----------+----------+----------+----------+----------+
For more information above Spark's DateTime function you can visit this blog:https://medium.com/expedia-group-tech/deep-dive-into-apache-spark-datetime-functions-b66de737950a
I hope this helps.
Upvotes: 5