Reputation: 7411
I am reading some JSON file into a data frame and I want to convert a field in it into a specific format, the JSON file has server_received_time
with the following format as String I want to convert it to be like yyyy-MM-dd:hh
"server_received_time":"2019-01-26T03:04:36Z"
but whatever I tied just returned null
df.select("server_received_time")
.withColumn("tx_date", to_date($"server_received_time", "yyy-MM-dd:hh").cast("timestamp"))
.withColumn("tx_date2", to_timestamp($"server_received_time", "yyy-MM-dd:hh").cast("timestamp"))
.withColumn("tx_date3", to_date(unix_timestamp($"server_received_time", "yyyy-MM-dd:hh").cast("timestamp")))
.withColumn("tx_date4", to_utc_timestamp(to_timestamp(col("server_received_time"), "yyyy-MM-dd:hh"), "UTC"))
.withColumn("tx_date5", to_timestamp($"server_received_time","yyyy-MM-dd:hh"))
.show(10, false)
+--------------------+-------+--------+--------+--------+--------+
|server_received_time|tx_date|tx_date2|tx_date3|tx_date4|tx_date5|
+--------------------+-------+--------+--------+--------+--------+
|2019-02-18T16:02:20Z|null |null |null |null |null |
|2019-02-18T16:02:20Z|null |null |null |null |null |
|2019-02-18T16:02:20Z|null |null |null |null |null |
|2019-02-18T16:02:20Z|null |null |null |null |null |
|2019-02-18T16:02:20Z|null |null |null |null |null |
|2019-02-18T16:02:20Z|null |null |null |null |null |
|2019-02-18T16:02:20Z|null |null |null |null |null |
|2019-02-18T16:02:20Z|null |null |null |null |null |
|2019-02-18T16:02:20Z|null |null |null |null |null |
|2019-02-18T16:02:20Z|null |null |null |null |null |
+--------------------+-------+--------+--------+--------+--------+
I want to have the server_received_time
in this format yyyy-MM-dd:hh
Upvotes: 0
Views: 3570
Reputation: 936
The format is different. This should work as below:
df.select(date_format(to_timestamp($"server_received_time", "yyyy-MM-dd'T'HH:mm:ss'Z'"), "yyyy-MM-dd:hh").as("custom_date"))
Upvotes: 1
Reputation: 46
to_
methods take actual format, not desired output format. To format you have to convert data back to string
import org.apache.spark.sql.functions._
val df = Seq("2019-02-18T16:02:20Z").toDF("server_received_time")
df.select(date_format(to_timestamp($"server_received_time"), "yyy-MM-dd:hh")).show
// +---------------------------------------------------------------+
// |date_format(to_timestamp(`server_received_time`), yyy-MM-dd:hh)|
// +---------------------------------------------------------------+
// | 2019-02-18:05|
// +---------------------------------------------------------------+
Upvotes: 3