Reputation: 109
i wrote a expression to parse my xml into CSV but it doesn't work, could do help me on it please. I do it this way because i can't use a parser like xmlstarlet on the server.
Here is my xml
<?xml version="1.0"?>
<root>
<record>
<country>US</country>
<data>
<id_client>50C</id_client>
<mail>[email protected]</mail>
<adress>10 </adress>
<num_tel>001</num_tel>
<name>toto</name>
<birth>01/30/008</birth>
</data>
<data>
<id_client>100K</id_client>
<mail>[email protected]</mail>
<adress>10 </adress>
<num_tel>002</num_tel>
<name>toto2</name>
<birth>01/30/011</birth>
</data>
</ record>
<record>
<country>China</country>
<data>
<id_client>99E</id_client>
<mail>[email protected]</mail>
<adress>10 </adress>
<num_tel>003</num_tel>
<name>toto3</name>
<birth>01/30/0008</birth>
</data>
<data>
<id_client>77B</id_client>
<mail>[email protected]</mail>
<adress>10 </adress>
<num_tel>004</num_tel>
<name>toto4</name>
<birth>2001/05/01</birth>
</data>
</record
</root>
the output i need:
country;id_client;name
US;50C;toto1
US;100K;toto2
China;99E;toto3
China77B;toto4
And finaly my syntax i'am trying to update:
/<country>/{sub(".*<country[^>]+><[^>]+>","",$0);sub("<.*","",$0);s=s";"$0}/<\/country>/{sub("^;","",s);print s;s=""}
Upvotes: 1
Views: 215
Reputation: 203129
If you're data's always laid out one entry per line like you show with no wacky white space intervening:
$ cat tst.awk
BEGIN {
FS="[><]"; OFS=";"
n = split("country id_client name",tags,/ /)
for (i=1; i<=n; i++) {
printf "%s%s", tags[i], (i<n?OFS:ORS)
}
}
{ tag2val[$2] = $3 }
/<\/data>/ {
for (i=1; i<=n; i++) {
printf "%s%s", tag2val[tags[i]], (i<n?OFS:ORS)
}
}
$ awk -f tst.awk file
country;id_client;name
US;50C;toto
US;100K;toto2
China;99E;toto3
China;77B;toto4
If you care about different or additional tags in future, just add them to the list in the split()
command.
Upvotes: 3