Reputation: 161
I am trying to extract text from within a large file, however I am only interested in the text between two patterns.
Sample text looks like this:
<account>0409</account><name>Charles</name><type>R</type><accountStatus>active</accountStatus>
My desired output should be only the text within the name tag, nothing before and nothing after. In example:
Output: Charles
In this case the starting pattern is <name>
and ending pattern </name>
How can I achieve this using grep/sed/awk?
Upvotes: 1
Views: 98
Reputation: 203597
Using GNU awk for multi-char RS:
$ awk -v RS='</?name>' '!(NR%2)' file
Charles
The above will work whether or not there are newlines anywhere in your input file and no matter how many times <name>...</name>
appears on one line or split across lines, it only requires that <name>
and </name>
always appear as pairs in the input file:
$ cat file
<name>Charles</name><name>William</name>
<name>Edward
</name>
<name> John Boy Walton </name>
$ awk -v RS='</?name>' '!(NR%2)' file
Charles
William
Edward
John Boy Walton
and if you want to strip any leading/trailing white space from the names it's a simple tweak:
$ awk -v RS='[[:space:]]*</?name>[[:space:]]*' '!(NR%2)' file
Charles
William
Edward
John Boy Walton
Upvotes: 2
Reputation: 41456
Using awk
awk -F"<|>" '/name/ {print $3}' file
Charles
If all data is on one line do:
awk -v RS="<" -F\> '/name/{print $2;exit}' file
Charles
Upvotes: 1