Reputation: 190
I want to parse this log sample
May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal
May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1
May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive: 32x/32x xa/form2 tray
May 3 11:53:31 cdh-dn03 kernel: piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
May 3 11:53:31 cdh-dn03 kernel: nf_conntrack version 0.5.0 (7972 buckets, 31888 max)
May 3 11:53:57 cdh-dn03 kernel: hrtimer: interrupt took 11250457 ns
May 3 11:53:59 cdh-dn03 ntpd_initres[1705]: host name not found: 0.rhel.pool.ntp.org
This is how I'm creating table and loading the data into it
CREATE TABLE LogParserSample(
month_name STRING, day STRING, time STRING, host STRING, event STRING, log STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'input.regex' = '(^(\S+))\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+((\S+.)*)')
stored as textfile;
I'm using these websites to generate regex
These two are the regexes I'm using
(\w{3})\s+(\w{1})\s+(\S+)\s+(\S+)\s+(\S+)\s+((\S+.)*)
(^(\S+))\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+((\S+.)*)
Loading data and selecting
load data local inpath '/home/programmeur_v/serde_dataset.txt' into table LogParserSample;
select * from LogParserSample;
Output as null
hive> select * from LogParserSample;
OK
NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL
Time taken: 0.094 seconds, Fetched: 7 row(s)
Just new to the hive so don't know what exactly is the problem
Upvotes: 3
Views: 152
Reputation: 31470
We need to use Java equivalent regex while creating Hive table using regex serde.
Try with below ddl:
hive> CREATE TABLE LogParserSample(
month_name STRING, day STRING, time STRING, host STRING, event STRING, log STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'input.regex' = '(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(.*)')
stored as textfile;
hive> select * from LogParserSample;
+-------------+------+-----------+-----------+----------------------+-----------------------------------------------------------------------------------------------------+--+
| month_name | day | time | host | event | log |
+-------------+------+-----------+-----------+----------------------+-----------------------------------------------------------------------------------------------------+--+
| May | 3 | 11:52:54 | cdh-dn03 | init: | tty (/dev/tty6) main process (1208) killed by TERM signal |
| May | 3 | 11:53:31 | cdh-dn03 | kernel: | registered taskstats version 1 |
| May | 3 | 11:53:31 | cdh-dn03 | kernel: | sr0: scsi3-mmc drive: 32x/32x xa/form2 tray |
| May | 3 | 11:53:31 | cdh-dn03 | kernel: | piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr |
| May | 3 | 11:53:31 | cdh-dn03 | kernel: | nf_conntrack version 0.5.0 (7972 buckets, 31888 max) |
| May | 3 | 11:53:57 | cdh-dn03 | kernel: | hrtimer: interrupt took 11250457 ns |
| May | 3 | 11:53:59 | cdh-dn03 | ntpd_initres[1705]: | host name not found: 0.rhel.pool.ntp.org |
+-------------+------+-----------+-----------+----------------------+-----------------------------------------------------------------------------------------------------+--+
Use this link to generate java equivalent regex.
Upvotes: 3