shiro
shiro

Reputation: 944

using grep linux command with perl regex + capturing groups

so I've done some research on the subject and I didn't quite find the perfect solution. For example I have a string inside a variable.

var="a1b1c2"

now what I want to do is match only "a" follow by any digit, but I only want it to return the number after "a" To match it a rule such as

'a\d'

and since I only need the digit, I tried with

'a(\d)'

and maybe it did capture it somewhere, but I don't know where, the output here is still "a1"

I also tried a non-capturing group to ignore the "a" in the output, but no effect in perl regex:

'(?:a)\d'

for reference, this is the full command in my terminal:

[root@host ~]# var="a1b1c2"
[root@host ~]# echo $var |grep -oP "a(\d)"
a1 <--output

Probably it's also possible without the -P (some not-perl regex format), I'm thankful for every answer :)

EDIT: using

\K

is not really the solution, since I don't necessarily need the last part of the match.

EDIT2: I need to able to get any part of the match, for instance:

[root@host ~]# var="a1b1c2"
[root@host ~]# echo $var |grep -oP "(a)\d"
a1 <--output
but the wanted output in this case would be "a"

EDIT3: The problem is nearly solved using "look-behind assertions" such as:

(?<=a)\d

will not return the letter "a", only the digit following it, but it needs a fixed length, for example it cannot be used as:

(?<=\w+)\d

EDIT4: The best way so far is either using perl or combine a combination of look-behind assertions and the \K but it still seems to have some limitations. For example:

1234_foo_1234_bar
1234567_foo_123456789_bar
1_foo_12345_bar

if "foo" and "bar" are place-holders for words that don't always have the same length,
there is no way to match all above examples while output "foobar", since the
number between them doesn't have a fixed length, while it can't be done with \K since we need "foo"

Any further suggestions are still appreciated :)

Upvotes: 8

Views: 13303

Answers (3)

hwnd
hwnd

Reputation: 70732

After some testing I found out, that the pattern inside the look-behind assertion needs to be fixed length (something like (?<=\w+)something will not work, any suggestions?

As I posted and deleted my answer previously because you stated it did not fit your needs:

Most of the time, you can avoid variable length lookbehinds by using \K. This resets the starting point of the reported match and any previously consumed characters are no longer included. (throws away everything that it has matched up to that point.)

The key difference between using \K and a lookbehind is that, a lookbehind does not allow the use of quantifiers: the length of what you are looking for must be fixed. But \K can be placed anywhere in a pattern, so you are able to use any quantifiers.

As you can see in the below example, using a quantifier in the lookbheind will not work.

echo 'foosomething' | grep -Po '(?<=\w+)something'
#=> grep: lookbehind assertion is not fixed length

So you could do:

echo 'foosomething' | grep -Po '\w+\Ksomething'
#=> something

To get a substring only between two patterns, you can add Positive Lookahead into the mix.

echo 'foosomethingbar' | grep -Po 'foo\K.*?(?=bar)'
#=> something

Or used fixed Lookbehind combined with Lookahead.

echo 'foosomethingbar' | grep -Po '(?<=foo).*?(?=bar)'
#=> something

Upvotes: 28

Slade
Slade

Reputation: 1364

The pattern (?<=a)\d uses a look-behind assertion to only print a digit following the letter 'a'. This works with GNU grep -Po, ack -o, and pcregrep -o. The assertion is zero width, so it isn't included in the match.

Upvotes: 2

TLP
TLP

Reputation: 67910

You can use Perl directly, accessing the environment variables through the %ENV hash:

perl -lwe 'print $ENV{var} =~ /a(\d+)/;'

It will only print the capture, inside the parentheses.

Upvotes: 1

Related Questions