Tahir Khalil
Tahir Khalil

Reputation: 23

xmlllint to parse a flie

need help to parse and convert values to store in csv.

See below the sample xml.

<list type='full' level='state' val='WI'>
<ac val='262'>
<ph val='0000000' />
<ph val='0003639' />
<ph val='0129292' />
</ac>
<ac val='363'>
<ph val='0000000' />
<ph val='0003639' />
</ac>
</list>

I need output to be like

262, '0000000'
262, '0003639'
262, '0129292'
363, '0000000'
363, '0003639'

I tried to loop through the file & entries but problem is we dont know how many phones we are getting against each ac (area codes) so the phone extraction loop (j) is a problem.

for i in {1..2}; do
    for j in {1..3}; do
        echo "i=$i, j=$j"
        xmllint  --xpath "concat(//ac[$i]/@val,',', //ac/ph[$j]/@val)" test.xml
    done
done

Can we do it in some simple way using xmllint?

Thanks

Upvotes: 0

Views: 66

Answers (2)

LMC
LMC

Reputation: 12662

Taking area code as a predictable value

for ac in 262 363; do
  xmllint --xpath "//ac[@val=$ac]/ph/@val" tmp.xml | sed -re "s/ val=([^=]+)/$ac, \1/g"
done

Result

262,"0000000"
262,"0003639"
262,"0129292"
363,"0000000"
363,"0003639"

To avoid parsing the file for each area code

declare -a xcmd
# setup xmllint shell commands
for ac in $(xmllint --xpath '//ac/@val' tmp.xml | cut -d'=' -f2 | tr -d ' "'); do
  xcmd[${#xcmd[@]}]="cd ac-$ac"
  xcmd[${#xcmd[@]}]="cat //ac[@val=$ac]/ph/@val"
done

# run and parse commands
printf "%s\n" "${xcmd[@]}" | xmllint --shell tmp.xml 2>/dev/null | grep -v ' ----' | \
while read line; do
  if grep -q '^[/] > cd ac-[0-9]' <<<"$line"; then
    c=$(cut -d '-' -f2 <<<"$line")
  else
    echo "$line" | sed -nre "s/val=([^=]+)/$c,\1/p"
  fi
done | grep -v '^[/] >'

Upvotes: 0

pmf
pmf

Reputation: 36033

Here's one way using xmlstarlet iterating over /list/ac/ph, then concatenating the parent node's ../@val with the current node's @val attribute values

xmlstarlet sel -t -m '/list/ac/ph' -v 'concat(../@val, ", ", @val)' --nl file.xml
262, 0000000
262, 0003639
262, 0129292
363, 0000000
363, 0003639

Here's another one using kislyuk/yq that has a built-in CSV generator with proper escaping:

xq -r '.list.ac[] | [."@val" | tonumber] + (.ph[] | [."@val"]) | @csv' file.xml
262,"0000000"
262,"0003639"
262,"0129292"
363,"0000000"
363,"0003639"

Upvotes: 1

Related Questions