How can I extract year and month from a string column in Pyspark?

Question

So, I have a column that looks like a timestamp but it's actually a string column. This columns looks like this 2022-04-01T00:00:00.000+0000. However, I tried a couple of things and they didn't work. I tried this:

.withColumn("year", year(to_date(col("full_time"),"yyyy-MM-dd"))) .withColumn("year", to_date(col("cycle.start_time"),"yyyy"))

None of them worked, so now i just don't know what else I could do. Could you guys give me some help please?

wwnde · Accepted Answer

df

+----------------------------+----+
|date                        |val |
+----------------------------+----+
|2022-04-01T00:00:00.000+0000|24.0|
+----------------------------+----+

solution

from pyspark.sql.functions import to_timestamp
#spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
 
df = (df4.withColumn("date",to_timestamp('date', 'yyyy-MM-dd'))#Coaerce to datettime
      .withColumn("month",month('date'))#extract month
      .withColumn("year",year('date'))#extract year
      
      )
df.show(truncate=False)

outcome

+-------------------+----+-----+----+
|date               |val |month|year|
+-------------------+----+-----+----+
|2022-04-01 00:00:00|24.0|4    |2022|
+-------------------+----+-----+----+

How can I extract year and month from a string column in Pyspark?

Answers (1)

Related Questions