Miguel A. Luna
Miguel A. Luna

Reputation: 143

Phoenix Spark driver is mapping as DateType() a DATE PHOENIX column

I have a Phoenix table with a column (type DATE)

DATE data type. The format is yyyy-MM-dd hh:mm:ss, with both the date and time parts maintained to a millisecond accuracy.

On the other hand, with PySpark, I'm trying to load this table as documentation say...

df = sqlContext.read \
.format("org.apache.phoenix.spark") \
.option("table", "MYTABLE") \
.option("zkUrl", "localhost:2181") \
.load()

Well, the problem is that the column of the created DataFrame is pyspark.sql.type.DateType(), a yyyy-MM-dd type, I have lost hh-MM-ss accuracy. Any suggestion?

If I define Phoenix column as TIMESTAMP the mapping in PySpark is TimestampType(), but I don't want to define the column as TIMESTAMP, I don't need a TIMESTAMP accuracy.

I would like define the column as DATE in Phoenix column and TimeStampType() in PySpark, is this possible?

Upvotes: 1

Views: 186

Answers (1)

Alok Pandey
Alok Pandey

Reputation: 1

Use option dateAsTimestamp as below it will read DATE from Phoenix column table as TimeStampType.

df = sqlContext.read \
.format("org.apache.phoenix.spark") \
.option("table","MYTABLE) \
.option("zkUrl","localhost:2181") \
.option("dateAsTimestamp","true") \
.load()

Spark schema will return TIMESTAMP even Phoenix dataType have DATE

Upvotes: 0

Related Questions