Hemant Urvey
Hemant Urvey

Reputation: 1

Converting String Time Stamp to DateTime in pyspark

Parsing "06/02/2019 7:30 AM" timestamp to 24Hr clock timestamp IN pyspark

This is My Dataframe's schema:

    root
     |-- Customer_Site_ID: string (nullable = true)
     |-- ReceivedDate: string (nullable = true)
     |-- BattV: double (nullable = true)

converting into 24Hr DateTime

from pyspark.sql.functions import to_timestamp
# raw_data_10_sites=raw_data_10_sites.withColumn("ReceivedDate_New", to_timestamp('ReceivedDate', "dd/mm/yyyy hh:mm a"))
raw_data_10_sites.select(to_timestamp(raw_data_10_sites.ReceivedDate, "dd/mm/yyyy HH:mm a").alias('New_RD')).show(10)

Actual Column- The below is my ReceivedDate column.

+-------------------+
|       ReceivedDate|
+-------------------+
| 06/02/2019 7:30 AM|
| 06/02/2019 8:01 AM|
| 06/02/2019 8:30 AM|
| 06/02/2019 8:49 AM|
| 06/02/2019 8:50 AM|
| 06/02/2019 8:52 AM|
| 06/02/2019 9:30 AM|
| 06/02/2019 9:46 AM|
| 06/02/2019 9:53 AM|
|06/02/2019 10:03 AM|
|06/02/2019 10:17 AM|
|06/02/2019 10:19 AM|
|06/02/2019 10:23 AM|
|06/02/2019 10:49 AM|
|06/02/2019 10:55 AM|
|06/02/2019 10:58 AM|
|06/02/2019 11:30 AM|
|06/02/2019 11:49 AM|
|06/02/2019 12:00 PM|
|06/02/2019 12:02 PM|
+-------------------+

But I got This-

+-------------------+
|             New_RD|
+-------------------+
|2019-01-06 07:30:00|
|2019-01-06 08:01:00|
|2019-01-06 08:30:00|
|2019-01-06 08:49:00|
|2019-01-06 08:50:00|
|2019-01-06 08:52:00|
|2019-01-06 09:30:00|
|2019-01-06 09:46:00|
|2019-01-06 09:53:00|
|2019-01-06 10:03:00|
+-------------------+

THE MONTH IS COMPLETELY WRONG, CAN ANYONE HELP ? I have tried many functions but it didn't worked for me. I have also gone through many blogs but this issue is unique. I just don't understand how is this possible?

Upvotes: 0

Views: 1609

Answers (3)

MusHusKat
MusHusKat

Reputation: 438

Since both minutes and months start with 'm', the format is case sensitive. For minutes, lower case 'm' is reserved and for month upper case 'M' is reserved. Simply edit your format to "dd/Mm/yyyy HH:mm a"

What's happening right now is the format is stating minutes twice and probably the 2nd position is getting precedence. The month is missing in the format altogether so by default 01 is imputed.

Upvotes: 1

Jesse Amano
Jesse Amano

Reputation: 828

Month and Minute both start with the letter M, so for the sake of unambiguous time formatting we need a way to tell them apart if we're going to use shorthand.

In the case of the to_timestamp function, you must use the conventions of the SimpleDateFormat object. In this case, m means minute and M means Month.

So: when you pass in the string "dd/mm/yyyy HH:mm a" to to_timestamp, you are telling it to interpret the string 06/02/2019 as the 6th day, 2nd minute of the year 2019, or January 6, 2019 at 00:02:00 (the time value afterward overrides this since it's parsed later).

If you meant for those dates to be interpreted as 6 February, use capital Ms in your time format string. If you meant for those dates to be interpreted as 2 June, then also transpose the ds and Ms.

Upvotes: 0

Abhijeet Dhumal
Abhijeet Dhumal

Reputation: 1809

You can use the udf to parse the date as shown below:

from datetime import datetime

raw_data_10_sites = raw_data_10_sites.withColumn('date_col',
F.udf(lambda d: datetime.strptime(d, '%d/%m/%Y %I:%M %p').strftime('%d/%m/%Y %H:%M:%S'), T.StringType())(F.col('ReceivedDate')))

raw_data_10_sites.show()

Upvotes: 0

Related Questions