Reputation:
I'm doing a simple search-and-replace in Perl, but I need some help. These are lines in a file:
1001(seperator could be "anything")john-1001(seperator could be "anything")mark
1001(seperator could be "anything")mark-1001(seperator could be "anything")john
I wanna assign a new userID for john, like 2001. So this is the result I want:
2001($1)john-1001-mark
1001-mark-2001($1)john
My regex works fine when john is first, but when mark is first, it get messed up.
Upvotes: 0
Views: 433
Reputation: 30841
It's almost impossible to answer this without having some idea of what the separator can be -- which characters, how many characters, etc. A non-greedy arbitrary separator would look like this:
s/\b1001\b(?=.*?\bjohn\b)/2001/
This replaces "1001" when followed by "john" while matching the minimum number of intermediate characters. .*?
is the non-greedy version of .*
. However, regexes always match if possible so this would still match
1001-mark-1001-john
In other words, it's not just a greediness problem. We need to define at least one of three things:
If we assume that the separator cannot contain "word" characters (a-z, 0-9, and underscore) we can get something workable:
s/\b1001\b(?=\W+?\bjohn\b)/2001/
The known parts ("1001" and "john") are bounded to prevent them from matching other strings with these substrings. (Thanks to Chas for noticing that edge case.)
Upvotes: 3
Reputation: 64919
Try this:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
s/\b1001-john\b/2001-john/;
print;
}
__DATA__
1001-john-1001-mark
1001-mark-1001-john
11001-john
1001-johnny
The \b
prevents it from matching things other than "1001-john"
. See the "Assertions" section of perldoc perlre
for more information.
Hmmm, it sounds like you need a sexeger:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
my $s = reverse;
$s =~ s/\bnhoj(.*?)1001\b/nhoj${1}1002/;
$s = reverse $s;
print $s;
}
__DATA__
1001-john-1001-mark
1001-mark-1001-john
11001-john
1001-johnny
The basic idea of a sexeger is to reverse the string, use a reversed regex, and then reverse the result. The problem is that .*?
gives you the shortest string from the first match, not the shortest possible string. Of course this will still have a problem with "1001-mark-2001-john"
as the .*?
will match "-mark-2001-"
. It is probably better to determine what the file format is and parse it rather than try to use a regex.
Upvotes: 3
Reputation: 192005
I'm guessing from your comments that the separator is not always a hyphen, and can in fact be more than one character.
For this case, try:
s/\d+([^\d]*)john/2001$1john/
This will keep the separator between "1001" and "john" intact during the replacing. Note that no digits are permitted in the separator, so this will work even when "john" appears after "mark" (because "-mark-1001-" is not a valid separator).
Upvotes: 0
Reputation: 151126
it can be something like
$s = '1001-mark-1001-john';
$s =~ s/(\d+)(-john)/2001$2/i;
print $s;
Upvotes: -1