Reputation: 13
I am scraping information from a log that I need 3 elements. Another added difficulty is that I am parsing the log via readLine()
in my java program aka one(1) line at a time. (If there is a possibility to read multiple lines when parsing let me know :) ) NOTE: I have no control over the log output format.
There are 2 possibilities of what I must extract. Either the log is nice and gives the following
NICE FORMAT
.text.rank 0x0000000000400b8f 0x351 is_x86.o
where I must grab .text.rank
, 0x0000000000400b8f
, and 0x351
Now the not so nice case: If the name is too long, it bumps everything else to the next line like is below, now the only thing after the first element is one blank space followed by a newline (\n) which gets clobbered by readLine()
anyway.
EVIL FORMAT : Note each line is in a separate arraylist entry.
.text.__sfmoreglue
0x0000000000401d00 0x55 /mnt/drv2homelibc_popcorn.a(lib_a-findfp.o)
Therefore what the regex actually sees is:
.text.__sfmoreglue
CORNER CASE FORMAT that also occurs within the log but I DO NOT want
*(.text.unlikely)
Finally below is my Pattern line I am currently using for the first line and pline2
is what is used on the next line when group 2 of the first line is empty.
UPDATE: The pattern below works for the NICE FORMAT and EVIL FORMAT But now pattern pline2
has no matches, even though on regex101.com it is correct. Link: https://regex101.com/r/vS7vZ3/9
UPDATE2: I fixed it, I forgot to add m2.find()
once I compiled the second line with Pattern pline2. Corrected code is below.
Pattern p = Pattern.compile("^[ \\s](\\.[tex]*\\.[\\._\\-\\@a-zA-Z0-9]*)\\s*([x0-9a-f]*)[ \\s]*([x0-9a-f]*).*");
Pattern pline2 = Pattern.compile("^\\s*([x0-9a-f]*)[ \\s]*([x0-9a-f]*)\\s*[\\w\\(\\)\\.\\-]*");
To give a little background I am first matching the name .text.whatever
to m.group(1)
followed by the address 0x000012345
to m.group(2)
and finally the size 0xa48
to m.group(3)
. This is all assuming the log is in the NICE format. If it is in the EVIL format I see that group(2) is empty and therefore readin the next line of the log to a temp buffer and apply the second pattern pline2
to new line.
Can someone help me with the regex? Is there a way I can make sure my current line (or even better, just the second grouping) is either the NICE FORMAT or is empty?
As requested my java code:
//1st line pattern
Pattern p = Pattern.compile("^[ \\s](\\.[tex]*\\.[\\._\\-\\@a-zA-Z0-9]*)\\s*([x0-9a-f]*)[ \\s]*([x0-9a-f]*).*");
//conditional 2nd line pattern
Pattern pline2 = Pattern.compile("^\\s*([x0-9a-f]*)[ \\s]*([x0-9a-f]*)\\s*[\\w\\(\\)\\.\\-]*");
while((temp = br1.readLine()) != null){
Matcher m = p.matcher(temp);
while(m.find()){
System.out.println("What regex finds: m1:"+m.group(1)+"# m2:"+m.group(2)+"# m3:"+m.group(3));
if(!m.group(1).isEmpty() && m.group(2).isEmpty() && m.group(3).isEmpty()){
//means we probably hit a long symbol name and important stuff is on the next line
//save the name at least
name = m.group(1);
//read and utilize the next line
if((temp = br1.readLine()) == null){
return;
}
System.out.println("EVILline2:"+temp); //sanity check the input
System.out.println(pline2.toString()); //sanity check the regex
Matcher m2= pline2.matcher(temp);
while(m2.find()){
System.out.println("regex line2 finds: m1:"+m2.group(1));//+"# m2:"+m2.group(2));
if(m2.group(2).isEmpty()){
size = 0;
}else{
size = Long.parseLong(m2.group(2).replaceFirst("0x", ""),16);
}
addr = Long.parseLong(m2.group(1).replaceFirst("0x", ""),16);
System.out.println("#########LONG NAME: "+name+" addr:"+addr+" size:"+size);
}
}//end if
else{ // assume in NICE FORMAT
//do nice format stuff.
}//end while
}//end outerwhile
An Aside, The output I currently get:
line: .text.c_print_results
What regex finds: m1:.text.c_print_results# m2:# m3:
EVIL FORMATline2: 0x00000000004001e6 0x231 c_print_results_x86.o
^\s*([x0-9a-f]*)[ \s]*([x0-9a-f]*)\s*[\w\(\)\.\-]*
Exception in thread "main" java.lang.IllegalStateException: No match found
at java.util.regex.Matcher.group(Matcher.java:536)
at java.util.regex.Matcher.group(Matcher.java:496)
at regexTest.regex.grabSymbolsInRange(regex.java:143)
at regexTest.regex.main(regex.java:489)
Upvotes: 1
Views: 223
Reputation: 4259
You have a few issues with your pattern.
So, instead of your posted Regex, you can try:
(\\.[tex]*\\.[\\._\\-\\@a-zA-Z0-9]*)\\s*([x0-9a-f]*)[ \\s]+([x0-9a-f]*)/g
Tested on Regex101.com:
https://regex101.com/r/lM4bQ9/1
Other then that, a few suggestions:
So having these in mind I would suggest you to use this pattern instead:
(\\.text\\.[^\\s]*)\\s*([x0-9a-f]*)\\s+([x0-9a-f]*)/g
Upvotes: 1