Reputation: 173
how could I grep exact word's(strings) from xml file. This is the part of xml file (input file):
<Sector sectorNumber="1">
<Cell cellNumber="1" cellIdentity="42901" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
<Cell cellNumber="2" cellIdentity="42905" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
</Sector>
<Sector sectorNumber="2">
<Cell cellNumber="1" cellIdentity="42902" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
<Cell cellNumber="2" cellIdentity="42906" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
</Sector>
<Sector sectorNumber="3">
<Cell cellNumber="1" cellIdentity="42903" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
<Cell cellNumber="2" cellIdentity="42907" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
</Sector>
I want to grep all cellIdentity="..."
, so bascily it should look like this
cellIdentity="42901"
cellIdentity="42905"
cellIdentity="42902"
cellIdentity="42906"
cellIdentity="42903"
cellIdentity="42907"
when I tried with grep -E "cellIdentity=" input.xml
I get whole sentence (line), but I need only as above...
Upvotes: 0
Views: 713
Reputation: 2160
Jordan@workstation:~$ egrep -o "cellIdentity=\"[0-9]{5}\"" ddff
cellIdentity="42901"
cellIdentity="42905"
cellIdentity="42902"
cellIdentity="42906"
cellIdentity="42903"
cellIdentity="42907"
-o
only outputs the matching string, and not the entire line.
[0-9]{5}
is looking for exactly 5 occurrences of digit.
Rest of the answer contains expected :)
Upvotes: 2
Reputation: 123680
To extract data from XML files, use XML tools:
xmlstarlet sel -t -m "//Cell" -m @cellIdentity -v . -n file.xml
This is far less fragile and handles way more XML files and edge cases than grep.
Upvotes: 1
Reputation: 59320
Use the -o
option of grep
to get only the matched pattern. With your example in a file named t.txt
:
grep -o 'cellIdentity="[0-9]*"' t.txt
cellIdentity="42901"
cellIdentity="42905"
cellIdentity="42902"
cellIdentity="42906"
cellIdentity="42903"
cellIdentity="42907"
Upvotes: 2
Reputation: 33387
You could use this regular expression:
grep -oP 'cellIdentity="\d*"' file
Upvotes: 1