Reputation: 501
I have a requirement to grep values from a xml file in shell sample file below: test.xml
<wtc-import>
<name>WTCImportedService-288-rap04</name>
<resource-name>CAC040F</resource-name>
<local-access-point>lap01</local-access-point>
<remote-access-point-list>rap04</remote-access-point-list>
<remote-name>CAC040F</remote-name>
</wtc-import>
<wtc-import>
<name>WTCImportedService-289-rap04</name>
<resource-name>CAD040F</resource-name>
<local-access-point>lap01</local-access-point>
<remote-access-point-list>rap04</remote-access-point-list>
<remote-name>CAD040F</remote-name>
</wtc-import>
<wtc-import>
<name>WTCImportedService-290-rap04</name>
<resource-name>CAE040F</resource-name>
<local-access-point>lap01</local-access-point>
<remote-access-point-list>rap04</remote-access-point-list>
<remote-name>CAE040F</remote-name>
</wtc-import>
<wtc-import>
<name>WTCImportedService-289-rap04</name>
<resource-name>CAD040F</resource-name>
<local-access-point>lap01</local-access-point>
<remote-access-point-list>rap04</remote-access-point-list>
<remote-name>CAD040F</remote-name>
</wtc-import>
Have to grep all values associated with in he file and at last if any duplicate resource name present remove the duplicated from the output file
Execpted output:
CAC040F
CAD040F
CAE040F
the resource CAD040F is a duplicate so in the expected output its just appeared once
Tried:
grep 'resource-name' test.xml | awk -F">" '{print $2}' | awk -F"<" '{print $1}'
and this is working good..how about filtering duplicates after that?
Upvotes: 0
Views: 260
Reputation: 10039
Just speed optimization compare to @stack0114106 that already the job
awk -F '[<>]' '$2 == "resource-name" && ! ( $3 in List) { print $3; List[$3] } ' test.xml
Upvotes: 1
Reputation: 8711
You can do it with a single awk command
awk -F"[<>]" '/resource-name/ && !seen[$3]++ { print $3 } ' test.xml
with your sample xml file
$ awk -F"[<>]" '/resource-name/ && !seen[$3]++ { print $3 } ' test.xml
CAC040F
CAD040F
CAE040F
$
Upvotes: 1
Reputation: 22012
If bash regex is your option, please try the following:
declare -A name
regex="<remote-name>([^<]+)</remote-name>"
while read -r line; do
if [[ $line =~ $regex ]]; then
name["${BASH_REMATCH[1]}"]=1
fi
done < "test.xml"
for i in "${!name[@]}"; do
echo "$i"
done
Upvotes: 0
Reputation: 124
If you are already getting the output and just looking to remove duplicates , the easiest way to do that is piping the output to sort and then to uniq so your command will look like this
grep 'resource-name' test.xml | awk -F">" '{print $2}' | awk -F"<" '{print $1}' | sort | uniq
Upvotes: 0