timestamp in hive reading from csv created by spark coming as null

Question

So I have input data coming like this:

HEADER_A|HEADER_B|HEADER_C|HEADER_D
2018-09-28 22:12:09.0461760|first|last|123
2018-09-28 22:14:11.0123550|first|second|124

I then read this into a dataframe, and save it as a temporary view named temp_view. Everything is read in as a string.

I will then query this temp_view:

val df = sqlContext.sql("""
    SELECT
    from_unixtime(unix_timestamp(), 'yyyy-MM-dd HH:mm:ss') as ETL_RunDate,
    HEADER_A as timestamp,
    HEADER_B as string,
    HEADER_c as string,
    HEADER_D as string
    FROM temp_view
""")

And then write that dataframe to a file. This file with then look like

2018-10-23 05:32:49|2018-09-28T22:12:09.046Z|first|last|123
2018-10-23 05:32:49|2018-09-28T22:14:11.123Z|first|second|124

If I then use this data as the data for a hive table with schema:

CREATE EXTERNAL TABLE `testdb.testtable`(
  `runts` timestamp COMMENT '',
  `header_a` timestamp COMMENT '',
  `header_b` string COMMENT '',
  `header_c` string COMMENT '',
  `header_d` string COMMENT '')

And query this table -

select * from testdb.testtable;

header_a is coming as null when it should be coming as the underlying data is there.

Why is this happening and how can I fix this?

timestamp in hive reading from csv created by spark coming as null

Answers (1)

Related Questions