How to extract multiple tag values from multiple xml files in linux

Question

We need to extract multiple tag values from multiple files.

We have around 1000 files with data similar to:


  432361
  Stuart
  0251115
  2016-11-14T22:27:53.477+08:00
  682082
  323A6C86-76AA-E611-80DA-005056B46023

we need to extract EmpName, SidNumber and EpisodeId from all the files to a single file. we are able to get one value at a time, for ex. using command:

nawk -F'[<>]' '//{print $3}' *.dat

But we need to get multiple tags of each file. the output format should be something similar to

EmpName Stuart SidNumber 0251115 EpisodeId 682082
EmpName Stuart SidNumber 0251115 EpisodeId 682082

or atleast space delimited values

Stuart 0251115 682082
Stuart 0251115 682082

any help would be appreciated.

Thanks in advance, Vivek

VIPIN KUMAR · Accepted Answer

Try this - (Created two sample files f1.txt f2.txt)

$ head f?.txt
==> f1.txt <==
 
      432361
      Stuart
      0251115
      2016-11-14T22:27:53.477+08:00
      682082
      323A6C86-76AA-E611-80DA-005056B46023
   

==> f2.txt <==
 
      432361
      vipin
      0251117
      2016-12-14T22:27:53.477+08:00
      682082
      323A6C86-76AA-E611-80DA-005056B46023

Processing...

$ for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf $3OFS} END {print ""}' $i;done
 Stuart 0251115 682082 
 vipin 0251117 682082

for proper formatted output -

$ for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf $3OFS} END {print ""}' $i;done|column -t
Stuart  0251115  682082
vipin   0251117  682082

if you don't have column cmd available you can try below cmd -

for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf "%-10s", $3OFS} END {print ""}' $i;done
Stuart    0251115   682082    
vipin     0251117   682082

In printf function of awk we can format the column values.

How to extract multiple tag values from multiple xml files in linux

Answers (2)

Related Questions