Rczone
Rczone

Reputation: 501

Grep and filter out values from a file

I have a requirement to grep values from a xml file in shell sample file below: test.xml

<wtc-import>
      <name>WTCImportedService-288-rap04</name>
      <resource-name>CAC040F</resource-name>
      <local-access-point>lap01</local-access-point>
      <remote-access-point-list>rap04</remote-access-point-list>
      <remote-name>CAC040F</remote-name>
    </wtc-import>
    <wtc-import>
      <name>WTCImportedService-289-rap04</name>
      <resource-name>CAD040F</resource-name>
      <local-access-point>lap01</local-access-point>
      <remote-access-point-list>rap04</remote-access-point-list>
      <remote-name>CAD040F</remote-name>
    </wtc-import>
   <wtc-import>
      <name>WTCImportedService-290-rap04</name>
      <resource-name>CAE040F</resource-name>
      <local-access-point>lap01</local-access-point>
      <remote-access-point-list>rap04</remote-access-point-list>
      <remote-name>CAE040F</remote-name>
    </wtc-import>
    <wtc-import>
  <name>WTCImportedService-289-rap04</name>
  <resource-name>CAD040F</resource-name>
  <local-access-point>lap01</local-access-point>
  <remote-access-point-list>rap04</remote-access-point-list>
  <remote-name>CAD040F</remote-name>
</wtc-import>

Have to grep all values associated with in he file and at last if any duplicate resource name present remove the duplicated from the output file

Execpted output:

CAC040F
CAD040F
CAE040F

the resource CAD040F is a duplicate so in the expected output its just appeared once

Tried:

grep 'resource-name' test.xml | awk -F">" '{print $2}' | awk -F"<" '{print $1}' 

and this is working good..how about filtering duplicates after that?

Upvotes: 0

Views: 260

Answers (4)

NeronLeVelu
NeronLeVelu

Reputation: 10039

Just speed optimization compare to @stack0114106 that already the job

awk -F '[<>]' '$2 == "resource-name" && ! ( $3 in List) { print $3; List[$3] } ' test.xml

Upvotes: 1

stack0114106
stack0114106

Reputation: 8711

You can do it with a single awk command

awk -F"[<>]" '/resource-name/ && !seen[$3]++ { print $3 } ' test.xml

with your sample xml file

$ awk -F"[<>]" '/resource-name/ && !seen[$3]++ { print $3 } ' test.xml
CAC040F
CAD040F
CAE040F

$

Upvotes: 1

tshiono
tshiono

Reputation: 22012

If bash regex is your option, please try the following:

declare -A name
regex="<remote-name>([^<]+)</remote-name>"

while read -r line; do
    if [[ $line =~ $regex ]]; then
        name["${BASH_REMATCH[1]}"]=1
    fi
done < "test.xml"

for i in "${!name[@]}"; do
    echo "$i"
done

Upvotes: 0

wooknight
wooknight

Reputation: 124

If you are already getting the output and just looking to remove duplicates , the easiest way to do that is piping the output to sort and then to uniq so your command will look like this

grep 'resource-name' test.xml | awk -F">" '{print $2}' | awk -F"<" '{print $1}' | sort | uniq

Upvotes: 0

Related Questions