drecal
drecal

Reputation: 3

Extract Text From CSV

I want to grab the regular expressions out of the snort rules.

Here's an example of the text that I've saved as a csv - https://rules.emergingthreats.net/open/snort-2.9.0/rules/emerging-exploit.rules

So there are multiple rules,

#by Akash Mahajan
#
alert udp $EXTERNAL_NET any -> $HOME_NET 14000 (msg:"ET EXPLOIT Borland VisiBroker Smart Agent Heap Overflow"; content:"|44 53 52 65 71 75 65 73 74|"; pcre:"/[0-9a-zA-Z]{50}/R"; reference:bugtraq,28084; reference:url,aluigi.altervista.org/adv/visibroken-adv.txt; reference:url,doc.emergingthreats.net/bin/view/Main/2007937; classtype:successful-dos; sid:2007937; rev:4;)

and I want only the text that appears after "pcre" in all of them, extracted and printed to a new file, without the quotes

 pcre:"/[0-9a-zA-Z]{50}/R";

So, from this line above, I want to end up with the below text;

 /[0-9a-zA-Z]{50}/R

From every place "pcre" appears in the whole file.

I've been messing around with grep, awk, and sed. I just can't figure it out. I'm fairly new to this.

Could anyone give me some tips?

Thanks

Upvotes: 0

Views: 195

Answers (2)

silverdrop
silverdrop

Reputation: 71

You can do this using grep. But the thing with grep is that it can't only display a matching group, it can only display the matched text.
In order to get by this you need to use look-ahead and look-behind.

Lookahead (?=foo)
Asserts that what immediately follows the current position in the string is foo

Lookbehind (?<=foo)
Asserts that what immediately precedes the current position in the string is foo

   ┌─ print file to standard output
   │                     ┌─ has pcre:" before matching group (look-behind)
   │                     │              ┌─ has "; after matching group (look-ahead)
cat file | grep -Po '(?<=pcre:\")(.*)(?=\";)'
                 ││               └─ what we want (matching group)
                 │└─ print only matched part
                 └─ all users

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203209

With GNU sed:

$ sed -n -r 's/.*\<pcre:"([^"]+).*/\1/p' file
/[0-9a-zA-Z]{50}/R

Upvotes: 1

Related Questions