Reputation: 143
I have a Phoenix table with a column (type DATE)
DATE data type. The format is yyyy-MM-dd hh:mm:ss, with both the date and time parts maintained to a millisecond accuracy.
On the other hand, with PySpark, I'm trying to load this table as documentation say...
df = sqlContext.read \
.format("org.apache.phoenix.spark") \
.option("table", "MYTABLE") \
.option("zkUrl", "localhost:2181") \
.load()
Well, the problem is that the column of the created DataFrame is pyspark.sql.type.DateType(), a yyyy-MM-dd type, I have lost hh-MM-ss accuracy. Any suggestion?
If I define Phoenix column as TIMESTAMP the mapping in PySpark is TimestampType(), but I don't want to define the column as TIMESTAMP, I don't need a TIMESTAMP accuracy.
I would like define the column as DATE in Phoenix column and TimeStampType() in PySpark, is this possible?
Upvotes: 1
Views: 186
Reputation: 1
Use option dateAsTimestamp as below it will read DATE from Phoenix column table as TimeStampType.
df = sqlContext.read \
.format("org.apache.phoenix.spark") \
.option("table","MYTABLE) \
.option("zkUrl","localhost:2181") \
.option("dateAsTimestamp","true") \
.load()
Spark schema will return TIMESTAMP even Phoenix dataType have DATE
Upvotes: 0