SoftTimur
SoftTimur

Reputation: 5490

Ignore white spaces in a regular expression

I would like to ignore white spaces and parse a pattern like (int, int) xx (int, int). For exemple,

import re
m = re.match(r"[\s]*\([\s]*(\d+)[\s]*,[\s]*(\d+)[\s]*\)[\s]*xx[\s]*\([\s]*(\d+)[\s]*,[\s]*(\d+)[\s]*\)[\s]*", "   (2,  74) xx   (5  ,6), physicist")
print (m.group(0)) #    (2,  74) xx   (5  ,6)
print (m.group(1)) # 2
print (m.group(2)) # 74
print (m.group(3)) # 5
print (m.group(4)) # 6

As you can see, in my pattern there are lots of [\s]* to represent zero or more white spaces. Is there a simpler way to write this pattern?

Upvotes: 1

Views: 5827

Answers (4)

fronthem
fronthem

Reputation: 4139

Straight forward answer is NO. Even they are only white spaces but the fact is they all are characters, thus, they are parts of pattern. I think there are some ways here

  1. Preprocess your string by removing unwanted white spaces.
  2. Find the another way to express your pattern.
  3. Use alternative methods for matching.

e.g.

>> re.findall(r'\d+', "   (2,  74) xx   (5  ,6), physicist")
['2', '74', '5', '6']

Upvotes: 3

CaffeineFueled
CaffeineFueled

Reputation: 581

I don't know of a method baked into regex, but the easiest solution that comes to mind is using a simple string replace:

import re
m = re.match(r"\((\d+),(\d+)\)xx\((\d+),(\d+)\)", "   (2,  74) xx   (5  ,6), physicist".replace(' ', ''))
print (m.group(0)) # (2,74)xx(5,6)
print (m.group(1)) # 2
print (m.group(2)) # 74
print (m.group(3)) # 5
print (m.group(4)) # 6

You could also use regex to remove any kind of whitespace (not just spaces):

import re
s = re.sub(r'\s+', '', '   (2,  74) xx   (5  ,6), physicist')
m = re.match(r"\((\d+),(\d+)\)xx\((\d+),(\d+)\)", s)
print (m.group(0)) # (2,74)xx(5,6)
print (m.group(1)) # 2
print (m.group(2)) # 74
print (m.group(3)) # 5
print (m.group(4)) # 6

Upvotes: 5

Roy D'atze
Roy D'atze

Reputation: 156

If you want to simplify your specific pattern you could eliminate all whitespaces in one separate step before, since they are not relevant for your pattern.

Example:

import re
input = '   (2,  74) xx   (5  ,6), physicist'
m = re.match(r"\((\d+),(\d+)\)xx\((\d+),(\d+)\)", input.replace(' ', '')

Upvotes: 2

176coding
176coding

Reputation: 3059

I think all you want is to get all the 4 integers, so you can delete all white spaces and then match

import re
a = '(  2 , 74 ) xx (5       , 6 )'
b = re.sub(r'\s+','',a)
m = re.match(r'\((\d+),(\d+)\)xx\((\d+),(\d+)\)',b)
print (m.group(0)) # (2,74)xx(5,6)
print (m.group(1)) # 2
print (m.group(2)) # 74
print (m.group(3)) # 5
print (m.group(4)) # 6

Upvotes: 2

Related Questions