Reputation: 46463
Suprisingly, the output of
import re
s = "a=2323.232323 b=23.23 c=112 d=12"
pattern = r'a=([-+]?(\d*[.])?\d+) b=([-+]?(\d*[.])?\d+) c=([-+]?(\d*[.])?\d+)'
tobereplacedwith = r'thisisb=\2 thisisa=\1 thisisc=\3'
print re.sub(pattern, tobereplacedwith, s)
is
thisisb=2323. thisisa=2323.232323 thisisc=23.23 d=12
Why doesn't this produce
thisisb=23.23 thisisa=2323.232323 thisisc=112 d=12
?
Upvotes: 1
Views: 1082
Reputation:
This is your regex, with the current grouping's:
Formatted and tested:
a=
( # (1 start)
[-+]?
( \d* [.] )? # (2)
\d+
) # (1 end)
\ b=
( # (3 start)
[-+]?
( \d* [.] )? # (4)
\d+
) # (3 end)
\ c=
( # (5 start)
[-+]?
( \d* [.] )? # (6)
\d+
) # (5 end)
Output:
** Grp 0 - ( pos 0 , len 27 )
a=2323.232323 b=23.23 c=112
** Grp 1 - ( pos 2 , len 11 )
2323.232323
** Grp 2 - ( pos 2 , len 5 )
2323.
** Grp 3 - ( pos 16 , len 5 )
23.23
** Grp 4 - ( pos 16 , len 3 )
23.
** Grp 5 - ( pos 24 , len 3 )
112
** Grp 6 - NULL
You don't need the optional capture subgroups.
After converting them to cluster groups :
# a=([-+]?(?:\d*[.])?\d+) b=([-+]?(?:\d*[.])?\d+) c=([-+]?(?:\d*[.])?\d+)
a=
( # (1 start)
[-+]?
(?: \d* [.] )?
\d+
) # (1 end)
\ b=
( # (2 start)
[-+]?
(?: \d* [.] )?
\d+
) # (2 end)
\ c=
( # (3 start)
[-+]?
(?: \d* [.] )?
\d+
) # (3 end)
Output:
** Grp 0 - ( pos 0 , len 27 )
a=2323.232323 b=23.23 c=112
** Grp 1 - ( pos 2 , len 11 )
2323.232323
** Grp 2 - ( pos 16 , len 5 )
23.23
** Grp 3 - ( pos 24 , len 3 )
112
Upvotes: 1
Reputation: 2157
When your capture groups get complex, sometimes it is easier to used named capture groups. For example:
pattern = r'a=(?P<thisisa>[-+]?(\d*[.])?\d+) b=(?P<thisisb>[-+]?(\d*[.])?\d+) c=(?P<thisisc>[-+]?(\d*[.])?\d+)'
tobereplacedwith = r'thisisb=\g<thisisb> thisisa=\g<thisisa> thisisc=\g<thisisc>'
To create a capture group named foo
, you use (?<foo>...)
. To create a back-reference to it, you use (?=foo)
. To get the contents of it, you use \g<foo>
.
Upvotes: 2
Reputation: 5875
From the perlretut:
If the groupings in a regexp are nested, $1 gets the group with the leftmost opening parenthesis, $2 the next opening parenthesis, etc.
Source: http://perldoc.perl.org/perlretut.html
Python's regex engine is based on Perl's so the behaviour is similar.
So:
a=(([-+]?(\d*[.])?\d+)
outer capture group i.e. 2323.232323
== Group 1
a=(([-+]?(\d*[.])?\d+)
inner capture group i.e. (\d*[.])
i.e. 2323.
== Group 2
b=([-+]?(\d*[.])?\d+)
outer capture group i.e. 23.23
== Group 3
To get the output you want, try this:
import re
s = "a=2323.232323 b=23.23 c=112 d=12"
pattern = r'a=([-+]?(\d*[.])?\d+) b=([-+]?(\d*[.])?\d+) c=([-+]?(\d*)([.]\d*)?)'
tobereplacedwith = r'thisisb=\3 thisisa=\1 thisisc=\6'
print re.sub(pattern, tobereplacedwith, s)
Output:
thisisb=23.23 thisisa=2323.232323 thisisc=112 d=12
Upvotes: 2