Reputation: 450
So, I have a column that looks like a timestamp but it's actually a string column. This columns looks like this 2022-04-01T00:00:00.000+0000
. However, I tried a couple of things and they didn't work. I tried this:
.withColumn("year", year(to_date(col("full_time"),"yyyy-MM-dd")))
.withColumn("year", to_date(col("cycle.start_time"),"yyyy"))
None of them worked, so now i just don't know what else I could do. Could you guys give me some help please?
Upvotes: 1
Views: 2451
Reputation: 26686
df
+----------------------------+----+
|date |val |
+----------------------------+----+
|2022-04-01T00:00:00.000+0000|24.0|
+----------------------------+----+
solution
from pyspark.sql.functions import to_timestamp
#spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
df = (df4.withColumn("date",to_timestamp('date', 'yyyy-MM-dd'))#Coaerce to datettime
.withColumn("month",month('date'))#extract month
.withColumn("year",year('date'))#extract year
)
df.show(truncate=False)
outcome
+-------------------+----+-----+----+
|date |val |month|year|
+-------------------+----+-----+----+
|2022-04-01 00:00:00|24.0|4 |2022|
+-------------------+----+-----+----+
Upvotes: 1