Reputation: 83
We need to extract multiple tag values from multiple files.
We have around 1000 files with data similar to:
<Employee>
<Id>432361</Id>
<EmpName>Stuart</EmpName>
<SidNumber>0251115</SidNumber>
<CreatedUtc>2016-11-14T22:27:53.477+08:00</CreatedUtc>
<EpisodeId>682082</EpisodeId>
<CorrelationId>323A6C86-76AA-E611-80DA-005056B46023</CorrelationId>
</Employee>
we need to extract EmpName, SidNumber and EpisodeId from all the files to a single file. we are able to get one value at a time, for ex. using command:
nawk -F'[<>]' '/<EpisodeId>/{print $3}' *.dat
But we need to get multiple tags of each file. the output format should be something similar to
EmpName Stuart SidNumber 0251115 EpisodeId 682082
EmpName Stuart SidNumber 0251115 EpisodeId 682082
or atleast space delimited values
Stuart 0251115 682082
Stuart 0251115 682082
any help would be appreciated.
Thanks in advance, Vivek
Upvotes: 0
Views: 3079
Reputation: 3147
Try this - (Created two sample files f1.txt f2.txt)
$ head f?.txt
==> f1.txt <==
<Employee>
<Id>432361</Id>
<EmpName>Stuart</EmpName>
<SidNumber>0251115</SidNumber>
<CreatedUtc>2016-11-14T22:27:53.477+08:00</CreatedUtc>
<EpisodeId>682082</EpisodeId>
<CorrelationId>323A6C86-76AA-E611-80DA-005056B46023</CorrelationId>
</Employee>
==> f2.txt <==
<Employee>
<Id>432361</Id>
<EmpName>vipin</EmpName>
<SidNumber>0251117</SidNumber>
<CreatedUtc>2016-12-14T22:27:53.477+08:00</CreatedUtc>
<EpisodeId>682082</EpisodeId>
<CorrelationId>323A6C86-76AA-E611-80DA-005056B46023</CorrelationId>
</Employee>
Processing...
$ for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf $3OFS} END {print ""}' $i;done
Stuart 0251115 682082
vipin 0251117 682082
for proper formatted output -
$ for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf $3OFS} END {print ""}' $i;done|column -t
Stuart 0251115 682082
vipin 0251117 682082
if you don't have column cmd available you can try below cmd -
for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf "%-10s", $3OFS} END {print ""}' $i;done
Stuart 0251115 682082
vipin 0251117 682082
In printf function of awk we can format the column values.
Upvotes: 1