Reputation: 209
I have a dataframe with timestamp values, like this one: 2018-02-15T11:39:13.000Z I want to have it in UNIX format, using Pyspark.
I tried something like data = datasample.withColumn('timestamp_cast', datasample['timestamp'].cast('date'))
but I lose a lot of information, since I only get day/month/year when I have milliseconds information in my source.
Result: 2018-02-15
Any idea to get unix format and keep precision? Thank you!
Upvotes: 11
Views: 18448
Reputation: 9247
Another possible method is to directly cast
the column to integer
df.withColumn('timestamp_unix', F.col('timestamp').cast('int'))
Upvotes: 0
Reputation: 2452
You can use the built in unix_timestamp
the following ways:
from pyspark.sql.functions import unix_timestamp
df = df.withColumn('unix', unix_timestamp('timestamp'))
Or
df = df.selectExpr('unix_timestamp(timestamp)')
Upvotes: 16