Martin Walczyński
Martin Walczyński

Reputation: 93

PySpark string column to timestamp conversion

I am currently learning pyspark and I need to convert a COLUMN of strings in format 13/09/2021 20:45 into a timestamp of just the hour 20:45.

Now I figured that I can do this with q1.withColumn("timestamp",to_timestamp("ts")) \ .show() (where q1 is my dataframe, and ts is a column we are speaking about) to convert my input into a DD/MM/YYYY HH:MM format, however values returned are only null. I therefore realised that I need an input in PySpark timestamp format (MM-dd-yyyy HH:mm:ss.SSSS) to convert it to a proper timestamp. Hence now my question:

How can I convert the column of strings dd/mm/yyyy hh:mm into a format understandable for pyspark so that I can convert it to timestamp format?

Upvotes: 0

Views: 471

Answers (2)

pltc
pltc

Reputation: 6082

There are different ways you can do that

from pyspark.sql import functions as F

# use substring
df.withColumn('hour', F.substring('A', 12, 15)).show()

# use regex
df.withColumn('hour', F.regexp_extract('A', '\d{2}:\d{2}', 0)).show()

# use datetime
df.withColumn('hour', F.from_unixtime(F.unix_timestamp('A', 'dd/MM/yyyy HH:mm'), 'HH:mm')).show()

# Output
# +----------------+-----+
# |               A| hour|
# +----------------+-----+
# |13/09/2021 20:45|20:45|
# +----------------+-----+

Upvotes: 1

user15144515
user15144515

Reputation:

unix_timestamp may be a help for your problem.

Just try this: Convert pyspark string to date format

Upvotes: 0

Related Questions