user3405864
user3405864

Reputation: 1

Trouble with finding substrings with shell script

I have a file formatted something like this:

./07/00-post.log:Referer: http://domain1.com/example/launch.jsp?BANKID=123&SOMEPARAM=123&...
./07/00-post.log:Referer: http://domain2.com/example/launch.jsp?PARAM=313&BANKID=13&...
...
...
./07/00-post.log:Referer: http://domainN.com/example/launch.jsp?BANKID=3213

Need to find and extract followed substrings for each line into separate file using shell script:

  1. Domain names between "http://" and "/" (domain1.com, domain2.com, ...)
  2. BANKID's for that domains (can be at different positions)

so i can have pairs of domains and ids at output.

I think cut won't work here. What utils can i use?

Upvotes: 0

Views: 49

Answers (1)

fedorqui
fedorqui

Reputation: 289725

As the text is noYou can use grep for this:

$ grep -Po '(?<=http://)[^/]*|(?<=BANKID=)\d*' file
domain1.com
123
domain2.com
13
domainN.com
3213

Which in fact is joining to different grep expressions:

Get the numbers after BANKID=:

$ grep -Po '(?<=BANKID=)\d*' file
123
13
3213

and get the domain after http:// and up to next /:

$ grep -Po '(?<=http://)[^/]*' file
domain1.com
domain2.com
domainN.com

Note that cut is a tool to be used when the text format is homogeneous. It can work for the domains part:

$ cut -d/ -f5 file
domain1.com
domain2.com
domainN.com

But in general, it is a better job for grep or sed as per the BANKID requirement.

Upvotes: 1

Related Questions