Reputation: 1
I have a file formatted something like this:
./07/00-post.log:Referer: http://domain1.com/example/launch.jsp?BANKID=123&SOMEPARAM=123&...
./07/00-post.log:Referer: http://domain2.com/example/launch.jsp?PARAM=313&BANKID=13&...
...
...
./07/00-post.log:Referer: http://domainN.com/example/launch.jsp?BANKID=3213
Need to find and extract followed substrings for each line into separate file using shell script:
so i can have pairs of domains and ids at output.
I think cut won't work here. What utils can i use?
Upvotes: 0
Views: 49
Reputation: 289725
As the text is noYou can use grep
for this:
$ grep -Po '(?<=http://)[^/]*|(?<=BANKID=)\d*' file
domain1.com
123
domain2.com
13
domainN.com
3213
Which in fact is joining to different grep
expressions:
Get the numbers after BANKID=
:
$ grep -Po '(?<=BANKID=)\d*' file
123
13
3213
and get the domain after http://
and up to next /
:
$ grep -Po '(?<=http://)[^/]*' file
domain1.com
domain2.com
domainN.com
Note that cut
is a tool to be used when the text format is homogeneous. It can work for the domains part:
$ cut -d/ -f5 file
domain1.com
domain2.com
domainN.com
But in general, it is a better job for grep
or sed
as per the BANKID
requirement.
Upvotes: 1