Kulasangar
Kulasangar

Reputation: 9454

Getting Null after extracting data from HDFS in Hive?

What have I done so far is, I've setup Hadoop and Hive in my linux enviroment and I'm trying to extract data from an HDFS file into my Hive table.

This is how the log line looks like:

apache 2013-10-09T14:04:32Zphp129.124.201.110/EKEE.php20019705910Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24

This my Hive query:

CREATE EXTERNAL TABLE LogParserSample(
logtype STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
WITH SERDEPROPERTIES (
 'input.regex' = '^[^0-9]+'
) 
STORED AS TEXTFILE
LOCATION '/mypath/';

When I did execute the above the table got created successfully. But then when I tried to query the data from that table using a SELECT statement, I'm getting null instead of the word apache. For this instance, I'm trying to extract the word apache from the above log line. But then I tried executing the same regex in rubular, the outcome was correct as I expected. I'm unable to figure out why!

Upvotes: 0

Views: 159

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44991

'input.regex' = '^([^0-9]+).*'
  1. Only expressions surrounded by brackets are mapped to columns.
  2. The whole record should be covered by the regex

create external table logparsersample
(
    logtype string
)
row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
with serdeproperties (
 'input.regex' = '^([^0-9]+).*'
) 
stored as textfile
;

select * from LogParserSample
;

+---------+
| logtype |
+---------+
| apache  |
+---------+

Upvotes: 1

Related Questions