user3319356
user3319356

Reputation: 173

Grep exact string's from xml file

how could I grep exact word's(strings) from xml file. This is the part of xml file (input file):

 <Sector sectorNumber="1">
    <Cell cellNumber="1" cellIdentity="42901" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
    <Cell cellNumber="2" cellIdentity="42905" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
  </Sector>
  <Sector sectorNumber="2">
    <Cell cellNumber="1" cellIdentity="42902" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
    <Cell cellNumber="2" cellIdentity="42906" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
  </Sector>
  <Sector sectorNumber="3">
    <Cell cellNumber="1" cellIdentity="42903" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
    <Cell cellNumber="2" cellIdentity="42907" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />   
  </Sector>

I want to grep all cellIdentity="...", so bascily it should look like this

cellIdentity="42901"
cellIdentity="42905"
cellIdentity="42902"
cellIdentity="42906"
cellIdentity="42903"
cellIdentity="42907"

when I tried with grep -E "cellIdentity=" input.xml I get whole sentence (line), but I need only as above...

Upvotes: 0

Views: 713

Answers (4)

PradyJord
PradyJord

Reputation: 2160

Jordan@workstation:~$ egrep -o "cellIdentity=\"[0-9]{5}\"" ddff 
cellIdentity="42901"
cellIdentity="42905"
cellIdentity="42902"
cellIdentity="42906"
cellIdentity="42903"
cellIdentity="42907"

-o only outputs the matching string, and not the entire line.

[0-9]{5} is looking for exactly 5 occurrences of digit.

Rest of the answer contains expected :)

Upvotes: 2

that other guy
that other guy

Reputation: 123680

To extract data from XML files, use XML tools:

xmlstarlet sel -t -m "//Cell" -m @cellIdentity -v . -n file.xml

This is far less fragile and handles way more XML files and edge cases than grep.

Upvotes: 1

damienfrancois
damienfrancois

Reputation: 59320

Use the -o option of grep to get only the matched pattern. With your example in a file named t.txt:

grep -o 'cellIdentity="[0-9]*"' t.txt 
cellIdentity="42901"
cellIdentity="42905"
cellIdentity="42902"
cellIdentity="42906"
cellIdentity="42903"
cellIdentity="42907"

Upvotes: 2

user000001
user000001

Reputation: 33387

You could use this regular expression:

grep -oP 'cellIdentity="\d*"' file

Upvotes: 1

Related Questions