Regular expression to match phrase with unicode character

Question

I am trying to parse the indicated range in the following sentence with regular expressions in Python (re package) but have had no luck so far:

body = 'Adulticides are modelled by increasing the mosquito mortality rate [9] , [20] – [22] .'

I'm trying to match

[20] – [22]

where the problem appears to be that the hyphen is not the usual - but some unicode hyphen –.

The closest I get to matching the first half of this range is this:

m = re.findall(r'$$20$$ ', body)

How would you match the entire range?

zura · Accepted Answer

You need to use it with unicode flag like this:

m = re.findall(r'$$\d+$$ – $$\d+$$', body, re.UNICODE)

This should return [20] – [22] from the string you`ve specified.

Answers (1)