Reputation:

Need help with greedy quantifier

I'm doing a simple search-and-replace in Perl, but I need some help. These are lines in a file:

1001(seperator could be "anything")john-1001(seperator could be "anything")mark
1001(seperator could be "anything")mark-1001(seperator could be "anything")john

I wanna assign a new userID for john, like 2001. So this is the result I want:

2001($1)john-1001-mark
1001-mark-2001($1)john

My regex works fine when john is first, but when mark is first, it get messed up.

Upvotes: 0

Answers (4)

Michael Carman

Reputation: 30841

It's almost impossible to answer this without having some idea of what the separator can be -- which characters, how many characters, etc. A non-greedy arbitrary separator would look like this:

s/\b1001\b(?=.*?\bjohn\b)/2001/

This replaces "1001" when followed by "john" while matching the minimum number of intermediate characters. .*? is the non-greedy version of .*. However, regexes always match if possible so this would still match

1001-mark-1001-john

In other words, it's not just a greediness problem. We need to define at least one of three things:

The characters the separator can contain.
The characters the separator cannot contain.
The number of characters in the separator.

If we assume that the separator cannot contain "word" characters (a-z, 0-9, and underscore) we can get something workable:

s/\b1001\b(?=\W+?\bjohn\b)/2001/

The known parts ("1001" and "john") are bounded to prevent them from matching other strings with these substrings. (Thanks to Chas for noticing that edge case.)

Upvotes: 3

Chas. Owens

Reputation: 64919

Try this:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    s/\b1001-john\b/2001-john/;
    print;
}

__DATA__
1001-john-1001-mark
1001-mark-1001-john
11001-john
1001-johnny

The \b prevents it from matching things other than "1001-john". See the "Assertions" section of perldoc perlre for more information.

Hmmm, it sounds like you need a sexeger:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    my $s = reverse;
    $s =~ s/\bnhoj(.*?)1001\b/nhoj${1}1002/;
    $s = reverse $s;
    print $s;
}

__DATA__
1001-john-1001-mark
1001-mark-1001-john
11001-john
1001-johnny

The basic idea of a sexeger is to reverse the string, use a reversed regex, and then reverse the result. The problem is that .*? gives you the shortest string from the first match, not the shortest possible string. Of course this will still have a problem with "1001-mark-2001-john" as the .*? will match "-mark-2001-". It is probably better to determine what the file format is and parse it rather than try to use a regex.

Upvotes: 3

Michael Myers

Reputation: 192005

I'm guessing from your comments that the separator is not always a hyphen, and can in fact be more than one character.

For this case, try:

s/\d+([^\d]*)john/2001$1john/

This will keep the separator between "1001" and "john" intact during the replacing. Note that no digits are permitted in the separator, so this will work even when "john" appears after "mark" (because "-mark-1001-" is not a valid separator).

Upvotes: 0

nonopolarity

Reputation: 151126

it can be something like

$s = '1001-mark-1001-john';
$s =~ s/(\d+)(-john)/2001$2/i;
print $s;

Upvotes: -1

Need help with greedy quantifier

Answers (4)

Related Questions