Basj
Basj

Reputation: 46463

Matching and replacing floats with re.sub

Suprisingly, the output of

import re
s = "a=2323.232323 b=23.23 c=112  d=12"
pattern = r'a=([-+]?(\d*[.])?\d+) b=([-+]?(\d*[.])?\d+) c=([-+]?(\d*[.])?\d+)'
tobereplacedwith = r'thisisb=\2 thisisa=\1 thisisc=\3'
print re.sub(pattern, tobereplacedwith, s)

is

thisisb=2323. thisisa=2323.232323 thisisc=23.23  d=12

Why doesn't this produce

thisisb=23.23 thisisa=2323.232323 thisisc=112  d=12

?

Upvotes: 1

Views: 1082

Answers (3)

user557597
user557597

Reputation:

This is your regex, with the current grouping's:
Formatted and tested:

 a=
 (                             # (1 start)
      [-+]? 
      ( \d* [.] )?                  # (2)
      \d+ 
 )                             # (1 end)
 \ b=
 (                             # (3 start)
      [-+]? 
      ( \d* [.] )?                  # (4)
      \d+ 
 )                             # (3 end)
 \ c=
 (                             # (5 start)
      [-+]? 
      ( \d* [.] )?                  # (6)
      \d+ 
 )                             # (5 end)

Output:

 **  Grp 0 -  ( pos 0 , len 27 ) 
a=2323.232323 b=23.23 c=112  
 **  Grp 1 -  ( pos 2 , len 11 ) 
2323.232323  
 **  Grp 2 -  ( pos 2 , len 5 ) 
2323.  
 **  Grp 3 -  ( pos 16 , len 5 ) 
23.23  
 **  Grp 4 -  ( pos 16 , len 3 ) 
23.  
 **  Grp 5 -  ( pos 24 , len 3 ) 
112  
 **  Grp 6 -  NULL 

You don't need the optional capture subgroups.
After converting them to cluster groups :

 # a=([-+]?(?:\d*[.])?\d+) b=([-+]?(?:\d*[.])?\d+) c=([-+]?(?:\d*[.])?\d+)

 a=
 (                             # (1 start)
      [-+]? 
      (?: \d* [.] )?
      \d+ 
 )                             # (1 end)
 \ b=
 (                             # (2 start)
      [-+]? 
      (?: \d* [.] )?
      \d+ 
 )                             # (2 end)
 \ c=
 (                             # (3 start)
      [-+]? 
      (?: \d* [.] )?
      \d+ 
 )                             # (3 end)

Output:

 **  Grp 0 -  ( pos 0 , len 27 ) 
a=2323.232323 b=23.23 c=112  
 **  Grp 1 -  ( pos 2 , len 11 ) 
2323.232323  
 **  Grp 2 -  ( pos 16 , len 5 ) 
23.23  
 **  Grp 3 -  ( pos 24 , len 3 ) 
112  

Upvotes: 1

Mike Covington
Mike Covington

Reputation: 2157

When your capture groups get complex, sometimes it is easier to used named capture groups. For example:

pattern = r'a=(?P<thisisa>[-+]?(\d*[.])?\d+) b=(?P<thisisb>[-+]?(\d*[.])?\d+) c=(?P<thisisc>[-+]?(\d*[.])?\d+)'
tobereplacedwith = r'thisisb=\g<thisisb> thisisa=\g<thisisa> thisisc=\g<thisisc>'

To create a capture group named foo, you use (?<foo>...). To create a back-reference to it, you use (?=foo). To get the contents of it, you use \g<foo>.

Upvotes: 2

Joe Young
Joe Young

Reputation: 5875

From the perlretut:

If the groupings in a regexp are nested, $1 gets the group with the leftmost opening parenthesis, $2 the next opening parenthesis, etc.

Source: http://perldoc.perl.org/perlretut.html

Python's regex engine is based on Perl's so the behaviour is similar.

So:

a=(([-+]?(\d*[.])?\d+) outer capture group i.e. 2323.232323 == Group 1

a=(([-+]?(\d*[.])?\d+) inner capture group i.e. (\d*[.]) i.e. 2323. == Group 2

b=([-+]?(\d*[.])?\d+) outer capture group i.e. 23.23 == Group 3

To get the output you want, try this:

import re
s = "a=2323.232323 b=23.23 c=112  d=12"
pattern = r'a=([-+]?(\d*[.])?\d+) b=([-+]?(\d*[.])?\d+) c=([-+]?(\d*)([.]\d*)?)'
tobereplacedwith = r'thisisb=\3 thisisa=\1 thisisc=\6'
print re.sub(pattern, tobereplacedwith, s)

Output:

thisisb=23.23 thisisa=2323.232323 thisisc=112  d=12

Upvotes: 2

Related Questions