Reputation: 3
I want to grab the regular expressions out of the snort rules.
Here's an example of the text that I've saved as a csv - https://rules.emergingthreats.net/open/snort-2.9.0/rules/emerging-exploit.rules
So there are multiple rules,
#by Akash Mahajan
#
alert udp $EXTERNAL_NET any -> $HOME_NET 14000 (msg:"ET EXPLOIT Borland VisiBroker Smart Agent Heap Overflow"; content:"|44 53 52 65 71 75 65 73 74|"; pcre:"/[0-9a-zA-Z]{50}/R"; reference:bugtraq,28084; reference:url,aluigi.altervista.org/adv/visibroken-adv.txt; reference:url,doc.emergingthreats.net/bin/view/Main/2007937; classtype:successful-dos; sid:2007937; rev:4;)
and I want only the text that appears after "pcre" in all of them, extracted and printed to a new file, without the quotes
pcre:"/[0-9a-zA-Z]{50}/R";
So, from this line above, I want to end up with the below text;
/[0-9a-zA-Z]{50}/R
From every place "pcre" appears in the whole file.
I've been messing around with grep, awk, and sed. I just can't figure it out. I'm fairly new to this.
Could anyone give me some tips?
Thanks
Upvotes: 0
Views: 195
Reputation: 71
You can do this using grep. But the thing with grep is that it can't only display a matching group, it can only display the matched text.
In order to get by this you need to use look-ahead and look-behind.
Lookahead (?=foo)
Asserts that what immediately follows the current position in the string is foo
Lookbehind (?<=foo)
Asserts that what immediately precedes the current position in the string is foo
┌─ print file to standard output
│ ┌─ has pcre:" before matching group (look-behind)
│ │ ┌─ has "; after matching group (look-ahead)
cat file | grep -Po '(?<=pcre:\")(.*)(?=\";)'
││ └─ what we want (matching group)
│└─ print only matched part
└─ all users
Upvotes: 0
Reputation: 203209
With GNU sed:
$ sed -n -r 's/.*\<pcre:"([^"]+).*/\1/p' file
/[0-9a-zA-Z]{50}/R
Upvotes: 1