Reputation: 49
I was wondering if there a generic way to extract a specific string which by design is an eleven characters alphanumeric string using awk approach? for ex-
cat ext.txt
This is a sample field where the code is MGTCBEBEECL for NR
This is a sample field where the code is MGTCBEBEE01 for NR
This field must be 030 when Rule_1 = 'FR' and Rule_2 is 'EUROFRANSBI' or 'EURO_NEAR' and code is PARBFRPPXXX
This field must be 0186 when Rule_1 = 'FR' and Rule_2 is 'EUROFRANSBI' or 'EURO_NEAR' and code is CITIFRPPXXX for the NR
For NFNC with Rule_1 is CA and Rule_2 is Universal and business code is null and official code must be 'CIBCCATTXXX'
I want to only extract the codes:-
MGTCBEBEECL
MGTCBEBEE01
PARBFRPPXXX
CITIFRPPXXX
CIBCCATTXXX
There are almost 100 such lines from which i am hoping to extract these distinct strings, but i am at my wits end how to make it more generic and non-redundant hence seeking this community's assistance!
Upvotes: 0
Views: 1080
Reputation: 133428
We could use match
function of awk
, written and tested in GNU awk
should work in any awk
. Simple explanation would be using match
function of awk
where we can use regex [[:alnum:]]{11}
to match 11 continuous alphanumeric in each line and if a TRUE match is found then printing sub string for matched value.
awk 'match($0,/[[:alnum:]]{11}/){print substr($0,RSTART,RLENGTH)}' Input_file
Upvotes: 0
Reputation: 203149
Using any sed that has -E
to enable EREs, e.g. GNU and BSD seds:
$ sed -En "s/.*code (is|must be) '?([[:upper:][:digit:]]+).*/\2/p" file
MGTCBEBEECL
MGTCBEBEE01
PARBFRPPXXX
CITIFRPPXXX
CIBCCATTXXX
Upvotes: 1
Reputation: 1126
There is a way with GNU awk
using FPAT:
awk -v FPAT='[[:alnum:]]{11}' '{print $NF}' file
MGTCBEBEECL
MGTCBEBEE01
PARBFRPPXXX
CITIFRPPXXX
CIBCCATTXXX
'[[:alnum:]]{11}'
GNU awk can handle fields that contain alphanumeric string with eleven characters.{print $NF}
for printing the desired fields.Upvotes: 2
Reputation: 47089
With the current examples you can do it with grep
like this:
<ext.txt grep -oE "(code is|code must be) '?[A-Z0-9]{11}'?" |
tr -d "'" |
grep -o '[^ ]*$'
Output:
MGTCBEBEECL
MGTCBEBEE01
PARBFRPPXXX
CITIFRPPXXX
CIBCCATTXXX
Upvotes: 0
Reputation: 14899
Using gawk:
gawk -F "[ ']" 'BEGIN{ r=@/[A-Z]{11}/ }r{ for (i=1; i<=NF;i++){ if($i~r) print $i} }' ext.txt
-F "[ ']"
use space or '
as field separator (to also find codes like 'CIBCCATTXXX'
)r=@/[A-Z]{11}/
assign the used regular expression (because it's used twice in the scriptfor(...
loop over all the field in a line, and print the field when it matches the regular expression.output:
MGTCBEBEECL
EUROFRANSBI
PARBFRPPXXX
EUROFRANSBI
CITIFRPPXXX
CIBCCATTXXX
Upvotes: 1