Reputation: 49198
Is it possible to write a single Python regular expression that can be applied to a multi-line string and change all occurrences of "foo" to "bar", but only on lines beginning with "#"?
I was able to get this working in Perl, using Perl's \G regular expression sigil, which matches the end of the previous match. However, Python doesn't appear to support this.
Here's the Perl solution, in case it helps:
my $x =<<EOF;
# foo
foo
# foo foo
EOF
$x =~ s{
( # begin capture
(?:\G|^\#) # last match or start of string plus hash
.*? # followed by anything, non-greedily
) # end capture
foo
}
{$1bar}xmg;
print $x;
The proper output, of course, is:
# bar
foo
# bar bar
Can this be done in Python?
Upvotes: 2
Views: 883
Reputation: 86494
It looked pretty easy to do with a regular expression:
>>> import re
... text = """line 1
... line 2
... Barney Rubble Cutherbert Dribble and foo
... line 4
... # Flobalob, bing, bong, foo and brian
... line 6"""
>>> regexp = re.compile('^(#.+)foo', re.MULTILINE)
>>> print re.sub(regexp, '\g<1>bar', text)
line 1
line 2
Barney Rubble Cutherbert Dribble and foo
line 4
# Flobalob, bing, bong, bar and brian
line 6
But then trying your example text is not so good:
>>> text = """# foo
... foo
... # foo foo"""
>>> regexp = re.compile('^(#.+)foo', re.MULTILINE)
>>> print re.sub(regexp, '\g<1>bar', text)
# bar
foo
# foo bar
So, try this:
>>> regexp = re.compile('(^#|\g.+)foo', re.MULTILINE)
>>> print re.sub(regexp, '\g<1>bar', text)
# foo
foo
# foo foo
That seemed to work, but I can't find \g in the documentation!
Moral: don't try to code after a couple of beers.
Upvotes: 1
Reputation: 3103
\g works in python just like perl, and is in the docs.
"In addition to character escapes and backreferences as described above, \g will use the substring matched by the group named name, as defined by the (?P...) syntax. \g uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'. The backreference \g<0> substitutes in the entire substring matched by the RE."
Upvotes: 0
Reputation: 181820
lines = mystring.split('\n')
for line in lines:
if line.startswith('#'):
line = line.replace('foo', 'bar')
No need for a regex.
Upvotes: 3