nima_zoli
nima_zoli

Reputation: 1

Pig latin Regex_extract_All

I am new to pig .I need to extract catalina log and the format is like below line.I need my program to use a pattern which can read next line which starts from INFO,but it does not do that.

A = LOAD 'catalina. USING TextLoader AS (line:chararray);  
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^([a-zA-z]{3}\\s[0-9]{1,2},\\s[0-9]{4}\\s[0-9]{1,2}:[0-9]{2}:[0-9]{2}\\s[A-Z]{2})(.*)INFO:(.*)$'))

STORE B IN 'output' ;

Input:

Nov 3, 2016 11:00:06 AM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 470 ms.

Upvotes: 0

Views: 125

Answers (1)

Brian R Armstrong
Brian R Armstrong

Reputation: 410

Your problem is the two (.) captures immediately before and after INFO. You want this instead:

^([a-zA-z]{3}\s[0-9]{1,2},\s[0-9]{4}\s[0-9]{1,2}:[0-9]{2}:[0-9]{2}\s[A-Z]{2})\s([\w\.]+)\sINFO:\s(.*)$

Upvotes: 0

Related Questions