Reputation: 985
I am trying to convert a string to a timestamp format in Pyspark.
from pyspark.sql.types import DateType
df = spark.createDataFrame([('28/Mar/2021:06:29:54 -0700',)], ['dt'])
df.select(date_format('dt', 'd/M/y:h:m:s Z').alias('date')).collect()
This does not seem to work, probably because the date_format function does not recognize this as a valid format. I get this :
[Row(date=None)]
Is there anyway I can make the pyspark functions understand the format doing something similar to the datetime module in Python ?
from datetime import datetime
datetime.strptime('28/Mar/2021:06:29:54 -0700', '%d/%b/%Y:%H:%M:%S %z')
When we pass the format it creates a valid datetime object
datetime.datetime(2021, 3, 28, 6, 29, 54, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200)))
Upvotes: 1
Views: 1685
Reputation: 42422
Unlike the Python datetime module, in Spark, you need to specify the number of characters for each pattern. Also, use to_timestamp
to convert a string to a timestamp type. date_format
is for the other way round, i.e. converting timestamp types to a string.
from pyspark.sql.functions import to_timestamp
df = spark.createDataFrame([('28/Mar/2021:06:29:54 -0700',)], ['dt'])
df.select(to_timestamp('dt', 'dd/MMM/yyyy:HH:mm:ss Z').alias('date')).collect()
# gives [Row(date=datetime.datetime(2021, 3, 28, 14, 29, 54))]
Upvotes: 1