Reputation: 157
I am working on a way to easily parse XML using bash for a defined purpose. I have gotten this to work with some code I found on this site which I then recoded everything because this code worked so well. This is currently working with a function and I have to have the data in a file to be able to process it. Here is it in it's working state:
[ ~]$ cat testxml.xml
CTYPE PARTS SYSTEM "parts.dtd">
<?xml-stylesheet type="text/css" href="xmlpartsstyle.css"?>
<PARTS>
<TITLE>Computer Parts</TITLE>
<PART>
<ITEM>Motherboard</ITEM>
<MANUFACTURER>ASUS</MANUFACTURER>
<MODEL>P3B-F</MODEL>
<COST> 123.00</COST>
</PART>
<PART>
<ITEM>Video Card</ITEM>
<MANUFACTURER>ATI</MANUFACTURER>
<MODEL>All-in-Wonder Pro</MODEL>
<COST> 160.00</COST>
</PART>
<PART>
<ITEM>Sound Card</ITEM>
<MANUFACTURER>Creative Labs</MANUFACTURER>
<MODEL>Sound Blaster Live</MODEL>
<COST> 80.00</COST>
</PART>
<PART>
<ITEM> 20 inch Monitor</ITEM>
<MANUFACTURER>LG Electronics</MANUFACTURER>
<MODEL> 995E</MODEL>
<COST> 290.00</COST>
</PART>
</PARTS>
[ ~]$
[ ~]$ rdom () { local IFS=\> ; read -d \< E C ;} ; while rdom; do if [[ $E = 'PART' ]] || [[ $E = 'ITEM' ]] || [[ $E = 'COST' ]] ; then echo $E: $C ; fi ; done < testxml.xml | xargs -L3
PART: ITEM: Motherboard COST: 123.00
PART: ITEM: Video Card COST: 160.00
PART: ITEM: Sound Card COST: 80.00
PART: ITEM: 20 inch Monitor COST: 290.00
[ ~]$
As you can see this pulls out the data I am looking for and I am able to reformat it to suit my needs. However I would much rather prefer to have this accept the input from stdin such as the following:
cat out.xml2 | IFS=\> ; until [ EOF ]; do read -d \< E C ; if [[ $E = 'PART' ]] || [[ $E = 'ITEM' ]] || [[ $E = 'COST' ]] ; then echo $E: $C ; fi ; done;
This code never ends the loop. This may be impossible and I just don't understand how the loop is ending b/c it has "rdom" as the expression it is waiting for to show loop termination. I've tried this with a while loop, etc. Not sure how to determine when the data is no more so that the loop can end. I feel like there may be a much better way restructure this that i'm completely missing although. I like being able to use stdin b/c it allows easy use for one liners. The actual data I am parsing is much larger and multi-dimensional. I created this example for testing purposes. The first example works with the large data I have though. End result is I am trying to get this to parse from stdin rather then from a file. Any recommendations are much appreciated.
Jeff
Upvotes: 1
Views: 34
Reputation: 113814
Try:
$ rdom() { local IFS=\> ; while read -d \< E C ; do if [[ $E = 'PART' ]] || [[ $E = 'ITEM' ]] || [[ $E = 'COST' ]] ; then echo $E: $C ; fi ; done; }
$ rdom <out.xml2
PART:
ITEM: Motherboard
COST: 123.00
PART:
ITEM: Video Card
COST: 160.00
PART:
ITEM: Sound Card
COST: 80.00
PART:
ITEM: 20 inch Monitor
COST: 290.00
Or, without using the function definition but still taking input from stdin:
{ IFS=\> ; while read -d \< E C ; do if [[ $E = 'PART' ]] || [[ $E = 'ITEM' ]] || [[ $E = 'COST' ]] ; then echo $E: $C ; fi ; done; } <out.xml2
Because the question does not show desired output, I don't know if this is what you want.
Some comments:
cat out.xml2 | IFS=\> ;
sends the text of out.xml2 to the variable assignment IFS=\>
. After the variable assignment completes, the text is discarded.
until [ EOF ]; do read -d \< E C ; ...
does not do what you want. In shell, the string EOF is just three characters. By contrast, while read -d \< E C ; do ...
will stop when the input is exhausted.
To demonstrate that the above work with piping, not just redirection from a file, try:
cat out.xml2 | rdom
Or:
cat out.xml2 | { IFS=\> ; while read -d \< E C ; do if [[ $E = 'PART' ]] || [[ $E = 'ITEM' ]] || [[ $E = 'COST' ]] ; then echo $E: $C ; fi ; done; }
Continuing with the use of cat
as a stand in for a pipeline:
$ cat out.xml2 | { IFS=\> ; while read -d \< E C ; do case "$E" in PART) printf "%s:" "$E";; ITEM) printf " %s: %s" "$E" "$C";; COST) printf " %s: %s\n" "$E" "$C";; esac ; done; }
PART: ITEM: Motherboard COST: 123.00
PART: ITEM: Video Card COST: 160.00
PART: ITEM: Sound Card COST: 80.00
PART: ITEM: 20 inch Monitor COST: 290.00
Upvotes: 1