LMHadoop
LMHadoop

Reputation: 57

hive hbase integration timestamp

I would like to store table into HBase using Hive (hive hbase integration ) My table contains a field typed TIMESTAMP (like DATE) I've done some research and i discovered that TIMESTAMP is not supported by HBASE, some what should I do?

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating dat at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:80)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) 
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) 
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) 
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) ... 9 more Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff] 
at java.sql.Timestamp.valueOf(Timestamp.java:185) 
at org.apache.hadoop.hive.serde2.lazy.LazyTimestamp.init(LazyTimestamp.java:74) 
at org.apache.hadoop.hive.serde2.lazy.LazyStruct.uncheckedGetField(LazyStruct.java:219) 
at org.apache.hadoop.hive.serde2.lazy.LazyStruct.getField(LazyStruct.java:192) 
at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:188) 
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98) 
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76)

Upvotes: 1

Views: 2439

Answers (1)

Daniel Koverman
Daniel Koverman

Reputation: 2573

The easiest thing to do would be to convert the TIMESTAMP into a STRING, INT, or FLOAT. This will have the unfortunate side effect of giving up Hive's built in TIMESTAMP support. Due to this you will lose

  • Read time checks to make sure your column contains a valid TIMESTAMP
  • The ability to transparently use TIMESTAMPSs of different formats
  • The use of Hive UDFs which operate on TIMESTAMPs.

The first two losses are mitigated if you choose a single format for your own timestamps and stick to it. The last is not a huge loss because only two Hive date functions actually operate on TIMESTAMPs. Most of them operate on STRINGs. If you aboslutely needed from_utc_timestamp and from_utc_timestamp, you can write your own UDF.

If you go with STRING and only need the date, I would go with a yyyy-mm-dd format. If you need the time as well go with yyyy-mm-dd hh:mm:ss, or yyyy-mm-dd hh:mm:ss[.fffffffff] if you need partial second timestamps. This format also is also consistent with how Hive expects TIMESTAMPs and is the form required for most Hive date functions.

If you with INT you again have a couple of options. If only the date is important, YYYYMMDD fits in with the "basic" format of ISO 8601 (This is a form I've personally used and found convenient when I didn't need to perform any date operations on the column). If the time is also important, go with YYYYMMDDhhmmss. This an acceptable variant for the basic form of ISO 8601 for date time. If you need fractional second timing, then use a FLOAT and the form YYYYMMDDhhmmss.fffffffff. Note that neither of these forms is consitent with how Hive expects integer or floating point TIMESTAMPs.

If the concept of calendar dates and time of day isn't important at all, then using an INT as a Unix timestamp is probably the easiest, or a FLOAT if you also need fractional seconds. This form is consistent with how Hive expects TIMESTAMPs.

Upvotes: 1

Related Questions