Reputation: 173
I have xml file, which is in format as below, and I want to transform it in csv otput as shown below. Unfortunatly I'm not allowed to install xmlstarlet or some other xml parser (I have only xmllint). How can I do this, for example with, awk, sed....
<xn:VsDataContainer id="site00881">
<es:listOfNe>SubNetwork=NL1_R,SubNetwork=AHPTUR14,MeContext=rbs008811,ManagedElement=1</es:listOfNe>
<es:listOfNe>SubNetwork=NL1_R,SubNetwork=AHPTUR14,MeContext=rbs008819,ManagedElement=1</es:listOfNe>
</xn:VsDataContainer>
<xn:VsDataContainer id="site00882">
<es:listOfNe>SubNetwork=NL1_R,SubNetwork=AHPTUR14,MeContext=rbs008821,ManagedElement=1</es:listOfNe>
<es:listOfNe>SubNetwork=NL1_R,SubNetwork=AHPTUR14,MeContext=rbs008829,ManagedElement=1</es:listOfNe>
</xn:VsDataContainer>
<xn:VsDataContainer id="site00883">
<es:listOfNe>SubNetwork=NL1_R,SubNetwork=ASDTUR13,MeContext=rbs008831,ManagedElement=1</es:listOfNe>
<es:listOfNe>SubNetwork=NL1_R,SubNetwork=ASDTUR_SIU,MeContext=siu008832,ManagedElement=siu008832</es:listOfNe>
</xn:VsDataContainer>
<xn:VsDataContainer id="site00884">
<es:listOfNe>SubNetwork=NL1_R,SubNetwork=AHPTUR14,MeContext=rbs008841,ManagedElement=1</es:listOfNe>
<es:listOfNe>SubNetwork=NL1_R,SubNetwork=AHPTUR14,MeContext=rbs008849,ManagedElement=1</es:listOfNe>
</xn:VsDataContainer>
output should be in csv format
rbs008811,site00881
rbs008819,site00881
rbs008821,site00882
rbs008829,site00882
rbs008831,site00883
siu008832,site00883
rbs008841,site00884
rbs008849,site00884
Upvotes: 0
Views: 712
Reputation: 247162
With the usual reservations about parsing XML:
gawk -v OFS=, '
match($0, /VsDataContainer id="([^"]+)/, m) {container = m[1]}
match($0, /MeContext=([^,]+)/, m) {print m[1], container}
' file
If you don't have GNU awk:
awk -v OFS=, '
/VsDataContainer id="/ {
sub(/.*id="/, "")
sub(/".*/, "")
container = $0
}
/MeContext=/ {
sub(/.*MeContext=/, "")
sub(/,.*/, "")
print $0, container
}
' file
Upvotes: 0
Reputation: 7959
I would help you with xmllint
, but your xml file don't seen to be valid.
Anyway here's a quick and dirty solution, which you should probably avoid:
grep -Po "(rbs|site)\d+" file.xml | awk '/site/{site=$1} /rbs/{print $1","site}'
rbs008811,site00881
rbs008819,site00881
rbs008821,site00882
rbs008829,site00882
rbs008831,site00883
rbs008841,site00884
rbs008849,site00884
Upvotes: 2