Reputation: 1256
I have Spark dataframe in where the Timestamp is in milliseconds.
+-----------------------+
|CALC_TS |
+-----------------------+
|2021-01-27 01:35:05.043|
|2021-01-27 01:35:05.043|
|2021-01-27 01:35:05.043|
I want to make it show microseconds like so:
+--------------------------+
|CALC_TS |
+--------------------------+
|2021-01-27 01:35:05.043000|
|2021-01-27 01:35:05.043000|
|2021-01-27 01:35:05.043000|
So basically I would like the milliseconds portion to show in terms of microseconds. In the above example, the 43
milliseconds from the 1st dataframe would be 43 thousand
microseconds as shown in the seconds dataframe.
I have tried:
df.withColumn('TIME', to_timestamp('CALC_TS', 'yyyy-MM-dd HH:mm:ss.SSSSSS'))
and
df.withColumn('TIME', col('CALC_TS').cast("timestamp"))
But they are giving the same result and stripping the last 3 zeroes. Is there a way to achieve this?
Upvotes: 2
Views: 1889
Reputation: 3419
You can use rpad.
Right pad with trailing zeros upto the expected length of your timestamp. In your case, a length of 26 characters (for format yyyy-MM-dd HH:mm:ss.SSSSSS
)
from pyspark.sql.functions import *
df.withColumn('CALC_TS_1', col('CALC_TS').cast("timestamp"))\
.withColumn('CALC_TS_1', rpad(col('CALC_TS_1').cast('string'),26,'0'))\
.show(truncate=False)
+--------------------------+--------------------------+
|CALC_TS |CALC_TS_1 |
+--------------------------+--------------------------+
|2021-01-27 01:35:05.043 |2021-01-27 01:35:05.043000|
|2021-01-27 01:35:05.043567|2021-01-27 01:35:05.043567|
+--------------------------+--------------------------+
Upvotes: 2
Reputation: 9427
to_timestamp(timestamp_str[,fmt])
accepts a string and returns a timestamp (type). If your CALC_TS
is already a timestamp as you said, you should rather use df.withColumn('TIME', date_format('CALC_TS','yyyy-MM-dd HH:mm:ss.SSSSSS'))
to format it to string, with microseconds precision. From Spark reference:
o Fraction: Use one or more (up to 9) contiguous 'S' characters, e,g SSSSSS, to parse and format fraction of second. For parsing, the acceptable fraction length can be [1, the number of contiguous ‘S’]. For formatting, the fraction length would be padded to the number of contiguous ‘S’ with zeros. Spark supports datetime of micro-of-second precision, which has up to 6 significant digits, but can parse nano-of-second with exceeded part truncated.
For Spark 2.4, and just to make it look like the precision of a timestamp field is microseconds, perhaps you can "fake" trailing zeroes while formatting it like this: date_format('CALC_TS','yyyy-MM-dd HH:mm:ss.SSS000')
Upvotes: 2
Reputation: 32660
If the columnCALC_TS
is of type string, first convert to TimestampType
using to_timestamp
and unix_timestamp
functions then using date_format
you can format it with 6 fractions in milliseconds :
from pyspark.sql import functions as F
df.printSchema()
#root
# |-- CALC_TS: string (nullable = true)
df1 = df.withColumn(
'TIME',
F.to_timestamp(
F.unix_timestamp('CALC_TS', "yyyy-MM-dd HH:mm:ss.SSS") # seconds
+ F.substring_index('CALC_TS', '.', -1).cast('float') / 1000 # milliseconds part
)
).withColumn(
"TIME_FORMAT",
F.date_format("TIME", "yyyy-MM-dd HH:mm:ss.SSSSSS")
)
df1.show(truncate=False)
#+-----------------------+-----------------------+--------------------------+
#|CALC_TS |TIME |TIME_FORMAT |
#+-----------------------+-----------------------+--------------------------+
#|2021-01-27 01:35:05.043|2021-01-27 01:35:05.043|2021-01-27 01:35:05.000043|
#|2021-01-27 01:35:05.043|2021-01-27 01:35:05.043|2021-01-27 01:35:05.000043|
#|2021-01-27 01:35:05.043|2021-01-27 01:35:05.043|2021-01-27 01:35:05.000043|
#+-----------------------+-----------------------+--------------------------+
#root
# |-- CALC_TS: string (nullable = true)
# |-- TIME: timestamp (nullable = true)
# |-- TIME_FORMAT: string (nullable = true)
If the column is already of type timestamp, simply use date_format
as in the above code.
Upvotes: 1