Reputation: 5490
I would like to ignore white spaces and parse a pattern like (int, int) xx (int, int)
. For exemple,
import re
m = re.match(r"[\s]*\([\s]*(\d+)[\s]*,[\s]*(\d+)[\s]*\)[\s]*xx[\s]*\([\s]*(\d+)[\s]*,[\s]*(\d+)[\s]*\)[\s]*", " (2, 74) xx (5 ,6), physicist")
print (m.group(0)) # (2, 74) xx (5 ,6)
print (m.group(1)) # 2
print (m.group(2)) # 74
print (m.group(3)) # 5
print (m.group(4)) # 6
As you can see, in my pattern there are lots of [\s]*
to represent zero or more white spaces. Is there a simpler way to write this pattern?
Upvotes: 1
Views: 5827
Reputation: 4139
Straight forward answer is NO. Even they are only white spaces but the fact is they all are characters, thus, they are parts of pattern. I think there are some ways here
e.g.
>> re.findall(r'\d+', " (2, 74) xx (5 ,6), physicist")
['2', '74', '5', '6']
Upvotes: 3
Reputation: 581
I don't know of a method baked into regex, but the easiest solution that comes to mind is using a simple string replace:
import re
m = re.match(r"\((\d+),(\d+)\)xx\((\d+),(\d+)\)", " (2, 74) xx (5 ,6), physicist".replace(' ', ''))
print (m.group(0)) # (2,74)xx(5,6)
print (m.group(1)) # 2
print (m.group(2)) # 74
print (m.group(3)) # 5
print (m.group(4)) # 6
You could also use regex to remove any kind of whitespace (not just spaces):
import re
s = re.sub(r'\s+', '', ' (2, 74) xx (5 ,6), physicist')
m = re.match(r"\((\d+),(\d+)\)xx\((\d+),(\d+)\)", s)
print (m.group(0)) # (2,74)xx(5,6)
print (m.group(1)) # 2
print (m.group(2)) # 74
print (m.group(3)) # 5
print (m.group(4)) # 6
Upvotes: 5
Reputation: 156
If you want to simplify your specific pattern you could eliminate all whitespaces in one separate step before, since they are not relevant for your pattern.
Example:
import re
input = ' (2, 74) xx (5 ,6), physicist'
m = re.match(r"\((\d+),(\d+)\)xx\((\d+),(\d+)\)", input.replace(' ', '')
Upvotes: 2
Reputation: 3059
I think all you want is to get all the 4 integers, so you can delete all white spaces and then match
import re
a = '( 2 , 74 ) xx (5 , 6 )'
b = re.sub(r'\s+','',a)
m = re.match(r'\((\d+),(\d+)\)xx\((\d+),(\d+)\)',b)
print (m.group(0)) # (2,74)xx(5,6)
print (m.group(1)) # 2
print (m.group(2)) # 74
print (m.group(3)) # 5
print (m.group(4)) # 6
Upvotes: 2