Reputation: 31
I have the following code that parses XML to display the node value of each element in a file.
#Abbreviation - symbol
cat elements/*.xml | egrep "<symbol>.*</symbol>" |sed -e "s/<symbol>\(.*\)<\/symbol>/\1/"|tr "|" " "
#Weight - atomic-weight
cat elements/*.xml | egrep "<atomic-weight>.*</atomic-weight>" |sed -e "s/<atomic-weight>\(.*\)<\/atomic-weight>/\1/"|tr "|" " "
#Number atomic-number
cat elements/*.xml | egrep "<atomic-number>.*</atomic-number>" |sed -e "s/<atomic-number>\(.*\)<\/atomic-number>/\1/"|tr "|" " " > number
How can I format the three of these outputs as a table instead of one huge sequential list?
Sample Data -
File1 -
<symbol>Ag</symbol>
<atomic-number>47</atomic-number>
<atomic-weight>107.8682</atomic-weight>
File2 -
<symbol>Ba</symbol>
<atomic-number>56</atomic-number>
<atomic-weight>137.327</atomic-weight>
Desired Output -
Symbol Number Weight
Ag 47 107.8682
Ba 56 137.327
Upvotes: 0
Views: 1013
Reputation: 1811
Provided the input files are XML external general parsed entities, and so concatenate to well-formed XML if wrapped in a root element, you can use xmlstarlet to process them in one go:
printf '<doc>%s</doc>\n' "$(cat file*.xml)" |
xmlstarlet select --template --var ofs="'$(printf "\t")'" \
--value-of 'concat("Symbol", $ofs, "Number", $ofs, "Weight")' --nl \
--match '*/*[position() mod 3 = 1]' --sort 'A:T:-' '.' \
--value-of 'concat(., $ofs, following-sibling::*[1], $ofs, following-sibling::*[2])' --nl
printf
: wrap a document element around input--var ofs
: define output field separator--value-of
: emit headerA
scending T
ext--sort 'A:N:-' 'following-sibling::*[1]'
Output:
Symbol Number Weight
Ag 47 107.8682
Ba 56 137.327
Upvotes: 0
Reputation: 22291
If the only thing you know is that is is valid XML, you would better use an XML parser. Such parsers come included, for instance, with Ruby or Perl. This would allow you to also parse a file content which looks like
<atomic-weight>
107.8682
</atomic-weight>
If however you can ensure that the input files follow exactly the format you have posted, you could do something like:
for file in File1 File2
do
tr '<>' ' ' <$file | cut -d ' ' -f 3
done
If you need to format the data into columns at particular positions, you could do something like
for file in File1 File2
do
printf ' put your format specification here ' $(tr '<>' ' ' <$file | cut -d ' ' -f 3)
done
Upvotes: 0
Reputation: 8621
Try this:
#!/bin/bash
printf '%-9s %-9s %-9s\n' "Symbol" "Number" "Weight"
for F in *.xml
do
symbol=$(grep -E "<symbol>.*</symbol>" "$F" | sed -e "s/.*<symbol>\(.*\)<\/symbol>.*/\1/")
number=$(grep -E "<atomic-number>.*</atomic-number>" "$F" | sed -e "s/.*<atomic-number>\(.*\)<\/atomic-number>.*/\1/")
weight=$(grep -E "<atomic-weight>.*</atomic-weight>" "$F" | sed -e "s/.*<atomic-weight>\(.*\)<\/atomic-weight>.*/\1/")
printf '%-9s %-9s %-9s\n' "$symbol" "$number" "$weight"
done
printf
allows you to format the width and alignment in that width of printed text (or number, or floats, ...).printf
, '%-9s'
means it will print the value using 9 chars wide, left aligned. Without the -
, it will align right.printf
does not output a carriage return unless you tell it to, which explains the \n
.grep ... | sed ...
commands, but for 2 details. 1 Used grep -E
instead of egrep
. 2 Added .*
at the beginning and end of your sed
to eliminate prefixes or suffixes to the <SOMETHING>
tags.The output I get is:
$ ./so.bash
Symbol Number Weight
Ag 47 107.8682
Ba 56 137.327
Upvotes: 2