CjRobin
CjRobin

Reputation: 161

Selecting text within two patterns using the command line

I am trying to extract text from within a large file, however I am only interested in the text between two patterns.

Sample text looks like this:

<account>0409</account><name>Charles</name><type>R</type><accountStatus>active</accountStatus>

My desired output should be only the text within the name tag, nothing before and nothing after. In example:

Output: Charles

In this case the starting pattern is <name> and ending pattern </name>

How can I achieve this using grep/sed/awk?

Upvotes: 1

Views: 98

Answers (2)

Ed Morton
Ed Morton

Reputation: 203597

Using GNU awk for multi-char RS:

$ awk -v RS='</?name>' '!(NR%2)' file
Charles

The above will work whether or not there are newlines anywhere in your input file and no matter how many times <name>...</name> appears on one line or split across lines, it only requires that <name> and </name> always appear as pairs in the input file:

$ cat file
<name>Charles</name><name>William</name>
<name>Edward
</name>
<name>   John Boy Walton   </name>
$ awk -v RS='</?name>' '!(NR%2)' file
Charles
William
Edward

   John Boy Walton

and if you want to strip any leading/trailing white space from the names it's a simple tweak:

$ awk -v RS='[[:space:]]*</?name>[[:space:]]*' '!(NR%2)' file
Charles
William
Edward
John Boy Walton

Upvotes: 2

Jotne
Jotne

Reputation: 41456

Using awk

awk -F"<|>" '/name/ {print $3}' file
Charles

If all data is on one line do:

awk -v RS="<" -F\> '/name/{print $2;exit}' file
Charles

Upvotes: 1

Related Questions