Reputation: 57
I would like to store table into HBase using Hive (hive hbase integration ) My table contains a field typed TIMESTAMP (like DATE) I've done some research and i discovered that TIMESTAMP is not supported by HBASE, some what should I do?
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating dat at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:80)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) ... 9 more Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
at java.sql.Timestamp.valueOf(Timestamp.java:185)
at org.apache.hadoop.hive.serde2.lazy.LazyTimestamp.init(LazyTimestamp.java:74)
at org.apache.hadoop.hive.serde2.lazy.LazyStruct.uncheckedGetField(LazyStruct.java:219)
at org.apache.hadoop.hive.serde2.lazy.LazyStruct.getField(LazyStruct.java:192)
at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:188)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76)
Upvotes: 1
Views: 2439
Reputation: 2573
The easiest thing to do would be to convert the TIMESTAMP
into a STRING
, INT
, or FLOAT
. This will have the unfortunate side effect of giving up Hive's built in TIMESTAMP
support. Due to this you will lose
TIMESTAMP
TIMESTAMPS
s of different formatsTIMESTAMP
s.The first two losses are mitigated if you choose a single format for your own timestamps and stick to it. The last is not a huge loss because only two Hive date functions actually operate on TIMESTAMP
s. Most of them operate on STRING
s. If you aboslutely needed from_utc_timestamp
and from_utc_timestamp
, you can write your own UDF.
If you go with STRING
and only need the date, I would go with a yyyy-mm-dd
format. If you need the time as well go with yyyy-mm-dd hh:mm:ss
, or yyyy-mm-dd hh:mm:ss[.fffffffff]
if you need partial second timestamps. This format also is also consistent with how Hive expects TIMESTAMP
s and is the form required for most Hive date functions.
If you with INT
you again have a couple of options. If only the date is important, YYYYMMDD
fits in with the "basic" format of ISO 8601 (This is a form I've personally used and found convenient when I didn't need to perform any date operations on the column). If the time is also important, go with YYYYMMDDhhmmss
. This an acceptable variant for the basic form of ISO 8601 for date time. If you need fractional second timing, then use a FLOAT
and the form YYYYMMDDhhmmss.fffffffff
. Note that neither of these forms is consitent with how Hive expects integer or floating point TIMESTAMP
s.
If the concept of calendar dates and time of day isn't important at all, then using an INT
as a Unix timestamp is probably the easiest, or a FLOAT
if you also need fractional seconds. This form is consistent with how Hive expects TIMESTAMP
s.
Upvotes: 1