rmhartman
rmhartman

Reputation: 89

Using matched pattern in awk

I want to print the matched pattern using awk. Not the field, not the line.

In vi, you can put the matched pattern in the substitution by surrounding it with parens and referencing it with curly braces and a number, like this:

:s/bufid=([0-9]*)/buffer id is {\0}/

The part that matches between parens is remembered and can be used.

In perl, it is similar

$_ = "Hello there, neighbor";
if (/\s(\w+),/) {             # memorize the word between space and comma
  print "the word was $1\n";  # the word was there
}

Is there any way I can do something similar with awk? I just want to extract the buffer id and print it, and only it.

The input line is XML, and will contain (among other things) 'bufId="123456"'. I want to print "123456"

so ...

awk < file.xml '/bufId="([0-9]*)"/ { print X; }'

What do I put where X is?

Can this even be done?

Upvotes: 2

Views: 234

Answers (4)

Allan
Allan

Reputation: 12438

Instead of going for a awk solution for this I would highly recommend using an XML parser:

$ cat file.xml
<elems><elem bufId="123456"/></elems>

$ xmllint --xpath "concat('\"',string(//elem/@bufId),'\"')" file.xml
"123456"

$ xmllint --xpath "string(//elem/@bufId)" file.xml
123456

Depending on if you want to have quotes in your output or not.

Another valid solution would be to use sed (if you really dislike XPATH and XML parser, and since there are already many good awk solutions I will introduce this one as well):

$ sed -n 's/^.*bufId="\([0-9]*\)".*$/\1/gp' file.xml
123456

$ sed -n 's/^.*bufId="\([0-9]*\)".*$/"\1"/gp' file.xml
"123456

Upvotes: 1

zzxyz
zzxyz

Reputation: 2981

Also with gawk (third param in match is specific to it):

~/test£ cat test
abc
~/test£ gawk '{ match($0, /a(.)(.)/, group)}{ print group[2] group[1]}' test
cb

Upvotes: 1

karakfa
karakfa

Reputation: 67497

with gawk

awk '{print gensub(/.*bufId="([0-9]*)"/,"\\1",1)}'

if you want the result to be quoted you have to capture the quotes as well.

Upvotes: 3

mattmc3
mattmc3

Reputation: 18325

This seems like a close approximation of what you were after. Not sure awk is going to be your best tool for this.

echo '<root><a bufId="123456"/></root>' | awk 'match($0, /bufId="/) { print substr($0, RSTART+7, RLENGTH-1)}'

This was a helpful starting point.

Upvotes: 2

Related Questions