Reputation: 591
So I have input data coming like this:
HEADER_A|HEADER_B|HEADER_C|HEADER_D
2018-09-28 22:12:09.0461760|first|last|123
2018-09-28 22:14:11.0123550|first|second|124
I then read this into a dataframe, and save it as a temporary view named temp_view
. Everything is read in as a string.
I will then query this temp_view:
val df = sqlContext.sql("""
SELECT
from_unixtime(unix_timestamp(), 'yyyy-MM-dd HH:mm:ss') as ETL_RunDate,
HEADER_A as timestamp,
HEADER_B as string,
HEADER_c as string,
HEADER_D as string
FROM temp_view
""")
And then write that dataframe to a file. This file with then look like
2018-10-23 05:32:49|2018-09-28T22:12:09.046Z|first|last|123
2018-10-23 05:32:49|2018-09-28T22:14:11.123Z|first|second|124
If I then use this data as the data for a hive table with schema:
CREATE EXTERNAL TABLE `testdb.testtable`(
`runts` timestamp COMMENT '',
`header_a` timestamp COMMENT '',
`header_b` string COMMENT '',
`header_c` string COMMENT '',
`header_d` string COMMENT '')
And query this table -
select * from testdb.testtable;
header_a
is coming as null when it should be coming as the underlying data is there.
Why is this happening and how can I fix this?
Upvotes: 2
Views: 759
Reputation: 1224
The issue is that the timestamp format in your output file yyyy-MM-dd'T'HH:mm:ss.SSSX
is different from the default Hive timestamp format yyyy-mm-dd hh:mm:ss[.f...]
.
You have two options:
df.write.format("csv").option("dateFormat", "yyyy-MM-dd hh:mm:ss.SSSSSSS")
ALTER TABLE testtable SET SERDEPROPERTIES ("timestamp.formats"= "yyyy-MM-dd'T'HH:mm:ss.SSSX")
Upvotes: 1