Vivek Vishal
Vivek Vishal

Reputation: 83

How to extract multiple tag values from multiple xml files in linux

We need to extract multiple tag values from multiple files.

We have around 1000 files with data similar to:

<Employee>
  <Id>432361</Id>
  <EmpName>Stuart</EmpName>
  <SidNumber>0251115</SidNumber>
  <CreatedUtc>2016-11-14T22:27:53.477+08:00</CreatedUtc>
  <EpisodeId>682082</EpisodeId>
  <CorrelationId>323A6C86-76AA-E611-80DA-005056B46023</CorrelationId>
</Employee>

we need to extract EmpName, SidNumber and EpisodeId from all the files to a single file. we are able to get one value at a time, for ex. using command:

nawk -F'[<>]' '/<EpisodeId>/{print $3}' *.dat

But we need to get multiple tags of each file. the output format should be something similar to

EmpName Stuart SidNumber 0251115 EpisodeId 682082
EmpName Stuart SidNumber 0251115 EpisodeId 682082 

or atleast space delimited values

Stuart 0251115 682082
Stuart 0251115 682082

any help would be appreciated.

Thanks in advance, Vivek

Upvotes: 0

Views: 3079

Answers (2)

VIPIN KUMAR
VIPIN KUMAR

Reputation: 3147

Try this - (Created two sample files f1.txt f2.txt)

$ head f?.txt
==> f1.txt <==
 <Employee>
      <Id>432361</Id>
      <EmpName>Stuart</EmpName>
      <SidNumber>0251115</SidNumber>
      <CreatedUtc>2016-11-14T22:27:53.477+08:00</CreatedUtc>
      <EpisodeId>682082</EpisodeId>
      <CorrelationId>323A6C86-76AA-E611-80DA-005056B46023</CorrelationId>
   </Employee>

==> f2.txt <==
 <Employee>
      <Id>432361</Id>
      <EmpName>vipin</EmpName>
      <SidNumber>0251117</SidNumber>
      <CreatedUtc>2016-12-14T22:27:53.477+08:00</CreatedUtc>
      <EpisodeId>682082</EpisodeId>
      <CorrelationId>323A6C86-76AA-E611-80DA-005056B46023</CorrelationId>
   </Employee>

Processing...

$ for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf $3OFS} END {print ""}' $i;done
 Stuart 0251115 682082 
 vipin 0251117 682082 

for proper formatted output -

$ for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf $3OFS} END {print ""}' $i;done|column -t
Stuart  0251115  682082
vipin   0251117  682082

if you don't have column cmd available you can try below cmd -

for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf "%-10s", $3OFS} END {print ""}' $i;done
Stuart    0251115   682082    
vipin     0251117   682082 

In printf function of awk we can format the column values.

Upvotes: 1

pyed
pyed

Reputation: 349

nawk -F'[<>]' '/<EmpName>|<SidNumber>|<EpisodeId>/{print $3}' *.dat

Upvotes: 0

Related Questions