Trevor.Screws
Trevor.Screws

Reputation: 581

PySpark - Cast Long Epoch (in Milliseconds) to TimestampType with Native Spark Functions

I'm using the PySpark library to read JSON files, process the data, and write back to parquet files.

The incoming data has a date field measured from the epoch in milliseconds. E.g., 1541106106796 represents: Thursday, November 1, 2018 9:01:46.796 PM.

A valid solution uses the Python datetime library:

def format_datetime(ts):
    return datetime.fromtimestamp(ts/1000.0)

...
get_timestamp = udf(lambda x: format_datetime(int(x)),TimestampType())
df = df.withColumn("timestamp", get_timestamp(df.ts))

Is there a solution that only uses native Spark functions?

Upvotes: 1

Views: 463

Answers (1)

notNull
notNull

Reputation: 31460

use from_unixtime and extract milliseconds from timestamp then add at the end, finally cast to timestamp type.

df.show()
#+-------------+
#|           ts|
#+-------------+
#|1541106106796|
#+-------------+

df.withColumn("ts1",expr('concat_ws(".",from_unixtime(substring(ts,1,length(ts)-3),"yyyy-MM-dd HH:mm:ss"),substring(ts,length(ts)-2,length(ts)))').cast("timestamp")).\
show(10,False)
#+-------------+-----------------------+
#|ts           |ts1                    |
#+-------------+-----------------------+
#|1541106106796|2018-11-01 16:01:46.796|
#+-------------+-----------------------+

To create unixtime use unix_timestamp and regexp_extract functions.

Example:

df.show(10,False)
#+-----------------------------------------+
#|sample                                   |
#+-----------------------------------------+
#|Thursday, November 1, 2018 9:01:46.796 PM|
#+-----------------------------------------+

df.withColumn("ts",concat_ws('',unix_timestamp(col("sample"),"E, MMMM d, yyyy hh:mm:ss.SSS a"),regexp_extract(col("sample"),"\\.(.*)\\s+",1))).\
show(10,False)
#+-----------------------------------------+-------------+
#|sample                                   |ts           |
#+-----------------------------------------+-------------+
#|Thursday, November 1, 2018 9:01:46.796 PM|1541124106796|
#+-----------------------------------------+-------------+

Upvotes: 1

Related Questions