Programmeur
Programmeur

Reputation: 190

Load log data in hive table using serde regex is null

I want to parse this log sample

May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal

May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1

May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive: 32x/32x xa/form2 tray

May 3 11:53:31 cdh-dn03 kernel: piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr

May 3 11:53:31 cdh-dn03 kernel: nf_conntrack version 0.5.0 (7972 buckets, 31888 max)

May 3 11:53:57 cdh-dn03 kernel: hrtimer: interrupt took 11250457 ns

May 3 11:53:59 cdh-dn03 ntpd_initres[1705]: host name not found: 0.rhel.pool.ntp.org

This is how I'm creating table and loading the data into it

CREATE TABLE LogParserSample(

 month_name STRING, day STRING, time STRING, host STRING, event STRING, log STRING)

 ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'   

 WITH SERDEPROPERTIES (

  'input.regex' = '(^(\S+))\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+((\S+.)*)')

 stored as textfile;

I'm using these websites to generate regex

http://www.regexe.com/

http://rubular.com/

These two are the regexes I'm using

(\w{3})\s+(\w{1})\s+(\S+)\s+(\S+)\s+(\S+)\s+((\S+.)*)

(^(\S+))\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+((\S+.)*)

Loading data and selecting

load data local inpath '/home/programmeur_v/serde_dataset.txt' into table LogParserSample;

select * from LogParserSample;

Output as null

hive> select * from LogParserSample;

OK

NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL

Time taken: 0.094 seconds, Fetched: 7 row(s)

Just new to the hive so don't know what exactly is the problem

Upvotes: 3

Views: 152

Answers (1)

notNull
notNull

Reputation: 31470

We need to use Java equivalent regex while creating Hive table using regex serde.

Try with below ddl:

hive> CREATE TABLE LogParserSample(
 month_name STRING, day STRING, time STRING, host STRING, event STRING, log STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'   
 WITH SERDEPROPERTIES (
  'input.regex' = '(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(.*)')
 stored as textfile;

hive> select * from LogParserSample;
+-------------+------+-----------+-----------+----------------------+-----------------------------------------------------------------------------------------------------+--+
| month_name  | day  |   time    |   host    |        event         |                                                 log                                                 |
+-------------+------+-----------+-----------+----------------------+-----------------------------------------------------------------------------------------------------+--+
| May         | 3    | 11:52:54  | cdh-dn03  | init:                | tty (/dev/tty6) main process (1208) killed by TERM signal                                           |
| May         | 3    | 11:53:31  | cdh-dn03  | kernel:              | registered taskstats version 1                                                                      |
| May         | 3    | 11:53:31  | cdh-dn03  | kernel:              | sr0: scsi3-mmc drive: 32x/32x xa/form2 tray                                                         |
| May         | 3    | 11:53:31  | cdh-dn03  | kernel:              | piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr  |
| May         | 3    | 11:53:31  | cdh-dn03  | kernel:              | nf_conntrack version 0.5.0 (7972 buckets, 31888 max)                                                |
| May         | 3    | 11:53:57  | cdh-dn03  | kernel:              | hrtimer: interrupt took 11250457 ns                                                                 |
| May         | 3    | 11:53:59  | cdh-dn03  | ntpd_initres[1705]:  | host name not found: 0.rhel.pool.ntp.org                                                            |
+-------------+------+-----------+-----------+----------------------+-----------------------------------------------------------------------------------------------------+--+

Use this link to generate java equivalent regex.

Upvotes: 3

Related Questions