Reputation: 1815
I have a string date column in the pyspark dataframe as shown below.
df1=spark.createDataFrame(
data = [ ("8/30/2019 12:00:00 AM"),("8/29/2019 12:00:00 AM"),("8/27/2019 12:00:00 AM")], schema=["DateJoined"])
+---------------------+
|DateJoined |
+---------------------+
|8/30/2019 12:00:00 AM|
|8/29/2019 12:00:00 AM|
|8/27/2019 12:00:00 AM
I am trying to convert it to date type using the below code but returning null
df = df.withColumn('date_col',
F.udf(lambda d: datetime.strptime(d, '%m/%d/%Y %I:%M %p').strftime('%m/%d/%Y %H:%M:%S'), T.StringType())(F.col('DateJoined')))
Can anyone help me with this?
Upvotes: 0
Views: 586
Reputation: 5052
You can also utilise the native to_date function , which supports multiple parsings - https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
df = pd.DataFrame({
'DateJoined' : ["8/30/2019 12:00:00 AM","8/29/2019 12:00:00 AM","8/27/2019 12:00:00 AM"]
})
sparkDF = sql.createDataFrame(df)
sparkDF.show()
+---------------------+
|DateJoined |
+---------------------+
|8/30/2019 12:00:00 AM|
|8/29/2019 12:00:00 AM|
|8/27/2019 12:00:00 AM|
+---------------------+
sparkDF = sparkDF.withColumn('DateJoined_parsed',F.to_date(F.col("DateJoined"),"M/dd/yyyy hh:mm:ss a"))
sparkDF.show(truncate=False)
+---------------------+-----------------+
|DateJoined |DateJoined_parsed|
+---------------------+-----------------+
|8/30/2019 12:00:00 AM|2019-08-30 |
|8/29/2019 12:00:00 AM|2019-08-29 |
|8/27/2019 12:00:00 AM|2019-08-27 |
+---------------------+-----------------+
Upvotes: 0
Reputation: 26676
I wouldn't use a udf
df=spark.createDataFrame([(1,'8/30/2019 12:00:00 AM'),
(2,'8/30/2019 12:00:00 AM'),
(3,'8/30/2019 12:00:00 AM')],
('id','DateJoined'))
df.show(truncate=False)
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
df.withColumn("DateJoined",F.from_unixtime(F.unix_timestamp("DateJoined",'MM/dd/yyyy hh:mm:ss aa'),'MM/dd/yyyy HH:mm:ss')).show()
or
df.withColumn("DateJoined",F.from_unixtime(F.unix_timestamp("DateJoined",'MM/dd/yyyy hh:mm:ss aa'),'MM/dd/yyyy HH:mm:ss aa')).show(truncate=False)
+---+-------------------+
| id| DateJoined|
+---+-------------------+
| 1|08/30/2019 00:00:00|
| 2|08/30/2019 00:00:00|
| 3|08/30/2019 00:00:00|
+---+-------------------+
Upvotes: 1