Reputation: 457
I have a xml like below
<root>
<FIToFICstmrDrctDbt>
<GrpHdr>
<MsgId>A</MsgId>
<CreDtTm>2001-12-17T09:30:47</CreDtTm>
<NbOfTxs>0</NbOfTxs>
<TtlIntrBkSttlmAmt Ccy="EUR">0.0</TtlIntrBkSttlmAmt>
<IntrBkSttlmDt>1967-08-13</IntrBkSttlmDt>
<SttlmInf>
<SttlmMtd>CLRG</SttlmMtd>
<ClrSys>
<Prtry>xx</Prtry>
</ClrSys>
</SttlmInf>
<InstgAgt>
<FinInstnId>
<BIC>AAAAAAAAAAA</BIC>
</FinInstnId>
</InstgAgt>
</GrpHdr>
</FIToFICstmrDrctDbt>
</root>
I need to extract the value of each tag value in separate variables using awk command. how to do it?
Upvotes: 12
Views: 50205
Reputation: 586
The following gawk command uses a record separator regex pattern to match the XML tags. Anything starting with a < followed by at least one non-> and terminated by a > is considered to be a tag. Gawk assigns each RS match into the RT variable. Anything between the tags will be parsed as the record text which gawk assigns to $0.
gawk 'BEGIN { RS="<[^>]+>" } { print RT, $0 }' myfile
Upvotes: 6
Reputation: 67281
below code stores all the tag values in an array!hope this helps. But i still belive this is not an optimal way to do it.
> perl -lne 'if(/>[^<]*</){$_=~m/>([^<]*)</;push(@a,$1)}if(eof){foreach(@a){print $_}}' temp
A
2001-12-17T09:30:47
0
0.0
1967-08-13
CLRG
xx
AAAAAAAAAAA
Upvotes: 0
Reputation: 274720
You can use awk
as shown below, however, this is NOT a robust solution and will fail if the xml is not formatted correctly e.g. if there are multiple elements on the same line.
$ dt=$(awk -F '[<>]' '/IntrBkSttlmDt/{print $3}' file)
$ echo $dt
1967-08-13
I suggest you use a proper xml processing tool, like xmllint
.
$ dt=$(xmllint --shell file <<< "cat //IntrBkSttlmDt/text()" | grep -v "^/ >")
$ echo $dt
1967-08-13
Upvotes: 26