Reputation: 45
I am trying to extract different details from multiple lines using awk
.
However I fail to run a test and also print the resulting output in one line.
The information is located in different blocks and then I need to extract details inside the block..
awk '
TRA TRB TRC
/EKYC/{for(i=1; i<10; i++)
{ (getline p )
if ( match(p,"TRA")) { print substr(p,4)}
if ( match(p,"TRB")) { print substr(p,4)}
if ( match(p,"TRC")) { print substr(p,4)}
}
}
' inputfile
The block EKYC will be there and the codes TRA TRB TRC will be located in between the EKYC blocks.
Sample text file is like below::
EKYC
TRA onlyThisTRA1
TRB onlyThisTRB1
THR notThis
EKYC
TRA onlyThisTRA2
TRB onlyThisTRB2
TRC onlyThisTRC2
EKYC
NOT
TRA onlyThisTRA3
YEH not this
TRC onlyThisTRC3
Desired output.. in one line per block
onlyThisTRA1 onlyThisTRA2 null
onlyThisTRA2 onlyThisTRB2 onlyThisTRC2
onlyThisTRA3 null onlyThisTRC3
Upvotes: 1
Views: 1260
Reputation: 203522
Whenever you have name-to-value pairs as you have in your data, the best approach is to first create an array capturing that mapping (n2v[]
below) and then you can just reference the values by their names:
$ cat tst.awk
BEGIN { OFS="\t" }
/EKYC/ { prt(); next }
{ n2v[$1] = $2 }
END { prt() }
function prt() { if (length(n2v)) print v("TRA"), v("TRB"), v("TRC"); delete n2v }
function v(n) { return (n in n2v ? n2v[n] : "null") }
$ awk -f tst.awk file
onlyThisTRA1 onlyThisTRB1 null
onlyThisTRA2 onlyThisTRB2 onlyThisTRC2
onlyThisTRA3 null onlyThisTRC3
Notice with the above that the names you're interested in each appear exactly once in either upper or lower case and there are no variables named based on the values in your data so if you need to add a new name you want to get printed (e.g. "THC") then you just add , v("THC")
inside the prt() function, and it only specifies the default null
value in one place so if you want a different default or a different algorithm for determining the default then you just change the v()
function.
It would actually be trivial to modify the script to accept a list of names to be printed on the command line:
$ cat tst.awk
BEGIN { OFS="\t" }
/EKYC/ { prt(); next }
{ val=$0; sub(/^[^[:space:]]+[[:space:]]+/,"",val); n2v[$1] = val }
END { prt() }
function prt( nameList,nameNr,numNames) {
if (length(n2v)) {
numNames = split(names,nameList)
for (nameNr=1; nameNr <= numNames; nameNr++) {
printf "%s%s", v(nameList[nameNr]), (nameNr<numNames ? OFS : ORS)
}
delete n2v
}
}
function v(n) { return (n in n2v ? n2v[n] : "null") }
$ awk -v names='TRA TRB TRC' -f tst.awk file
onlyThisTRA1 onlyThisTRB1 null
onlyThisTRA2 onlyThisTRB2 onlyThisTRC2
onlyThisTRA3 null onlyThisTRC3
$ awk -v names='TRA THR TRC YEH' -f tst.awk file
onlyThisTRA1 notThis null null
onlyThisTRA2 null onlyThisTRC2 null
onlyThisTRA3 null onlyThisTRC3 not this
Note that I modified the way that n2v[]
is populated in that second script to allow for spaces to occur after your name value since your YEH
value (which I'm now printing above) has a space in it. If there are no spaces then that change isn't required and if the separator is tab then you can just set FS="\t"
in the BEGIN section and then again you don't need that modification.
Upvotes: 1
Reputation: 92854
awk solution:
awk 'function pr(a){
n="null"; tra=a["TRA"]; trb=a["TRB"]; trc=a["TRC"];
printf "%s %s %s\n",(tra)? tra:n,(trb)? trb:n,(trc)? trc:n; delete a
}
/EKYC/{ if(f){ pr(a); f=0 } }
/^TR[ABC]/{ a[$1]=$2; f=1 }END{ pr(a) }' file
The output:
onlyThisTRA1 onlyThisTRB1 null
onlyThisTRA2 onlyThisTRB2 onlyThisTRC2
onlyThisTRA3 null onlyThisTRC3
Upvotes: 1
Reputation: 12877
Using awk multi dimensional arrays:
awk '/EKYC/ { cnt++;cnt1=0 } $0 != "EKYC" { cnt1++;if ($2 ~ "not") { $2 = "null" } dat[cnt,cnt1]=$2 } END { for (i=1;i<=cnt;i++) { for (p=1;p<=cnt1;p++) { printf "%s\t",dat[i,p] } print "" } }' filename
Set increment cnt when EKYC is seen and re-initialise cnt1. Use the counters to create and array of data storing the second space deliited piece of data. Finally loop through the multi-dimensional array to print the data.
Upvotes: 1
Reputation: 785156
You can use this awk
command:
awk '/EKYC/{if (tra != "null") print tra, trb, trc; tra=trb=trc="null"; next}
$1=="TRA"{tra=$2} $1=="TRB"{trb=$2} $1=="TRC"{trc=$2}
END{print tra, trb, trc}' file
onlyThisTRA1 onlyThisTRB1 null
onlyThisTRA2 onlyThisTRB2 onlyThisTRC2
onlyThisTRA3 null onlyThisTRC3
Upvotes: 1