Reputation: 1525
Here my DataFrame looks like this:
+----------------+-------------+
| Business_Date| Code|
+----------------+-------------+
|1539129600000000| BSD|
|1539129600000000| BTN|
|1539129600000000| BVI|
|1539129600000000| BWP|
|1539129600000000| BYB|
+----------------+-------------+
I wanted to convert the Business_Date
column from bigint
to timestamp
value while loading data into hive table.
How can I do this?
Upvotes: 3
Views: 8971
Reputation: 43524
You can use pyspark.sql.functions.from_unixtime()
which will
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
It appears that your Business_Date
needs to be divided by 1M to convert to seconds.
For example:
from pyspark.sql.functions import from_unixtime, col
df = df.withColumn(
"Business_Date",
from_unixtime(col("Business_Date")/1000000).cast("timestamp")
)
df.show()
#+---------------------+----+
#|Business_Date |Code|
#+---------------------+----+
#|2018-10-09 20:00:00.0|BSD |
#|2018-10-09 20:00:00.0|BTN |
#|2018-10-09 20:00:00.0|BVI |
#|2018-10-09 20:00:00.0|BWP |
#|2018-10-09 20:00:00.0|BYB |
#+---------------------+----+
from_unixtime
returns a string so you can cast the result to a timestamp
.
Now the new schema:
df.printSchema()
#root
# |-- Business_Date: timestamp (nullable = true)
# |-- Code: string (nullable = true)
Upvotes: 3