Reputation: 71
I have an xml file with input like this. I am trying to write a shell script to remove the wildcards in the host.
<Group>
<GroupEntry groupname="aM"/>
<GroupSubjectEntry host="*" name="root"/>
<GroupSubjectEntry host="*" name="apro"/>
<GroupSubjectEntry host="*" name="rock"/>
</Group>
<Group>
<GroupEntry groupname="ESB"/>
<GroupSubjectEntry host="*" name="esbsvc"/>
<GroupSubjectEntry host="*" name="retryt"/>
</Group>
<Group>
<GroupEntry groupname="Omega"/>
<GroupSubjectEntry host="*" name="omegauser"/>
</Group>
</GroupSet>
I have a text file that has hostnames for each of the groupnames as below.
aM
hostname1
hostname2
ESB
hostname3
hostname4
Omega
hostname5
hostname6
hostname7
hostname8
hostname1
I am trying to parse/go through the text file and change the xml file to remove the wildcards. So, the result i am trying to get is
<Group>
<GroupEntry groupname="aM"/>
<GroupSubjectEntry host="hostname1" name="root"/>
<GroupSubjectEntry host="hostname1" name="apro"/>
<GroupSubjectEntry host="hostname1" name="rock"/>
<GroupSubjectEntry host="hostname2" name="root"/>
<GroupSubjectEntry host="hostname2" name="apro"/>
<GroupSubjectEntry host="hostname2" name="rock"/>
</Group>
<Group>
<GroupEntry groupname="ESB"/>
<GroupSubjectEntry host="hostname3" name="esbsvc"/>
<GroupSubjectEntry host="hostname3" name="retryt"/>
<GroupSubjectEntry host="hostname4" name="esbsvc"/>
<GroupSubjectEntry host="hostname4" name="retryt"/>
</Group>
<Group>
<GroupEntry groupname="Omega"/>
<GroupSubjectEntry host="hostname5" name="omegauser"/>
<GroupSubjectEntry host="hostname6" name="omegauser"/>
<GroupSubjectEntry host="hostname7" name="omegauser"/>
<GroupSubjectEntry host="hostname8" name="omegauser"/>
<GroupSubjectEntry host="hostname1" name="omegauser"/>
</Group>
</GroupSet>
I tried with sed and awk as the below example
sed '/GroupSubjectEntry host="\*"/p' omegatest.xml|sed '0,/\*/s//host/'
but that's just changing the first line.
I thought of running this through a for loop
and using sed p
option but there's too many varaibles involved. I am basically trying to remove the wildcards in the xml to add appropriate hostnames.
Can someone please help?
Upvotes: 0
Views: 100
Reputation: 133508
Could you please try following, written and tested with GNU awk
. Fair warning tools eg--> xmlstarlet
are recommended to deal with xmls since OP couldn't use those and doesn't have those so coming with this one but there is no guarantee that this will work with all kind of xmls, this has written strictly for shown samples only.
1st solution: As per OP's expected output:
awk '
!NF{ next }
FNR==NR{
if($0 ~ /GroupEntry groupname="/){
match($0,/"[^"]*/)
val=substr($0,RSTART+1,RLENGTH-1)
match($0,/^ +/)
spaces[val]=substr($0,RSTART,RLENGTH)
namesVal[val]=$0
next
}
if($0 ~ /<GroupSubjectEntry host=/){
match($0,/name="[^"]*/)
names[val]=(names[val]?names[val] ORS:"")substr($0,RSTART+6,RLENGTH-6)
next
}
if($0~/<Group>/ || $0~/<\/Group>/){
rest[++count1]=$0
}
next
}
!/hostname/{
if($0 in names){
nameVal=namesVal[$0]
check=$0
if(FNR==1){ print rest[++count2];found="" }
print namesVal[$0]
num=split(names[$0],arr,"\n")
}
if(found){ print rest[++count2];found="" }
}
/^hostname/{
found=1
for(i=1;i<=num;i++){
print spaces[check] "<GroupSubjectEntry host=\"" $0"\" name=\""arr[i]"\"/>"
}
next
}
END{
print rest[count2]
}
' Input_file groupnames
2nd solution: If OP is NOT bothering of name sequence from actual Input-file then one could try following.
awk '
FNR==NR{
if(!NF){ next }
if($0!~/^hostname/){ val=$0 }
else { arr[val]=(arr[val]?arr[val] ORS:"")$0 }
next
}
/<GroupEntry groupname=/ && match($0,/".*"/){
val=substr($0,RSTART+1,RLENGTH-2)
}
/GroupSubjectEntry host=/{
match($0,/^ +/)
spaces=substr($0,RSTART,RLENGTH)
match($0,/name="[^"]*/)
name=substr($0,RSTART+6,RLENGTH-6)
num=split(arr[val],arr1,"\n")
for(i=1;i<=num;i++){
print spaces "<GroupSubjectEntry host=\"" arr1[i]"\" name=\""name"\"/>"
}
next
}
1' groupnames Input_file
Also this gives output in order of hostnames
with respective entry of groupname, I hope OP is ok wit it.
Upvotes: 1