Reputation: 49

How to extract only specific strings from each line of a file using awk?

I was wondering if there a generic way to extract a specific string which by design is an eleven characters alphanumeric string using awk approach? for ex-

cat ext.txt

This is a sample field where the code is MGTCBEBEECL for NR
This is a sample field where the code is MGTCBEBEE01 for NR
This field must be 030 when Rule_1 = 'FR' and Rule_2  is 'EUROFRANSBI' or 'EURO_NEAR' and code is PARBFRPPXXX 
This field must be 0186 when Rule_1 = 'FR' and Rule_2  is 'EUROFRANSBI' or  'EURO_NEAR' and code is CITIFRPPXXX for the NR
For NFNC with Rule_1 is CA and Rule_2 is Universal and business code is null and official code must be 'CIBCCATTXXX'

I want to only extract the codes:-

MGTCBEBEECL 
MGTCBEBEE01 
PARBFRPPXXX 
CITIFRPPXXX 
CIBCCATTXXX

There are almost 100 such lines from which i am hoping to extract these distinct strings, but i am at my wits end how to make it more generic and non-redundant hence seeking this community's assistance!

Upvotes: 0

Answers (5)

RavinderSingh13

Reputation: 133750

We could use match function of awk, written and tested in GNU awk should work in any awk. Simple explanation would be using match function of awk where we can use regex [[:alnum:]]{11} to match 11 continuous alphanumeric in each line and if a TRUE match is found then printing sub string for matched value.

awk  'match($0,/[[:alnum:]]{11}/){print substr($0,RSTART,RLENGTH)}' Input_file

Upvotes: 0

Ed Morton

Reputation: 204488

Using any sed that has -E to enable EREs, e.g. GNU and BSD seds:

$ sed -En "s/.*code (is|must be) '?([[:upper:][:digit:]]+).*/\2/p" file
MGTCBEBEECL
MGTCBEBEE01
PARBFRPPXXX
CITIFRPPXXX
CIBCCATTXXX

Upvotes: 1

Carlos Pascual

Reputation: 1126

There is a way with GNU awk using FPAT:

awk -v FPAT='[[:alnum:]]{11}' '{print $NF}' file
MGTCBEBEECL
MGTCBEBEE01
PARBFRPPXXX
CITIFRPPXXX
CIBCCATTXXX

Setting the FPAT as '[[:alnum:]]{11}' GNU awk can handle fields that contain alphanumeric string with eleven characters.
and {print $NF} for printing the desired fields.

Upvotes: 2

Thor

Reputation: 47219

With the current examples you can do it with grep like this:

<ext.txt grep -oE "(code is|code must be) '?[A-Z0-9]{11}'?" | 
tr -d "'"                                                   |
grep -o '[^ ]*$'

Output:

MGTCBEBEECL
MGTCBEBEE01
PARBFRPPXXX
CITIFRPPXXX
CIBCCATTXXX

Upvotes: 0

Luuk

Reputation: 14958

Using gawk:

gawk -F "[ ']" 'BEGIN{ r=@/[A-Z]{11}/ }r{ for (i=1; i<=NF;i++){ if($i~r) print $i} }' ext.txt

-F "[ ']" use space or ' as field separator (to also find codes like 'CIBCCATTXXX')
r=@/[A-Z]{11}/ assign the used regular expression (because it's used twice in the script
for(... loop over all the field in a line, and print the field when it matches the regular expression.

output:

MGTCBEBEECL
EUROFRANSBI
PARBFRPPXXX
EUROFRANSBI
CITIFRPPXXX
CIBCCATTXXX

Upvotes: 1

How to extract only specific strings from each line of a file using awk?

Answers (5)

Related Questions