Reputation: 4480
df1:
Timestamp:
1995-08-01T00:00:01.000+0000
Is there a way to separate the day of the month in the timestamp column of the data frame using pyspark. Not able to provide the code, I am new to spark. I do not have a clue on how to proceed.
Upvotes: 8
Views: 22003
Reputation: 571
I am working on Apache Spark 3.2.1
I have tried the following:
Both of the format didn't work for me, what worked for me is the following
I have seen another difference too for using unix_timestamp
before to_timestamp
, here are the findings
unix_timestamp
- your original timestamp will loose anything after .
.099
+00:00 this will change to 2024-02-11T20:07:28.000
+00:00unix_timestamp
and use the following instead you will be able to retain everythingdf = df.withColumn("ConvertToTimestamp", F.to_timestamp("StringTimestamp", format))
Upvotes: 0
Reputation: 17872
You can parse this timestamp using unix_timestamp:
from pyspark.sql import functions as F
format = "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
df2 = df1.withColumn('Timestamp2', F.unix_timestamp('Timestamp', format).cast('timestamp'))
Then, you can use dayofmonth in the new Timestamp column:
df2.select(F.dayofmonth('Timestamp2'))
More detials about these functions can be found in the pyspark functions documentation.
Upvotes: 14