user2042696
user2042696

Reputation:

Regular expression to match phrase with unicode character

I am trying to parse the indicated range in the following sentence with regular expressions in Python (re package) but have had no luck so far:

body = 'Adulticides are modelled by increasing the mosquito mortality rate [9] , [20] – [22] .'

I'm trying to match

[20] – [22]

where the problem appears to be that the hyphen is not the usual - but some unicode hyphen .

The closest I get to matching the first half of this range is this:

m = re.findall(r'\[20\] ', body)

How would you match the entire range?

Upvotes: 0

Views: 182

Answers (1)

zura
zura

Reputation: 106

You need to use it with unicode flag like this:

m = re.findall(r'\[\d+\] – \[\d+\]', body, re.UNICODE)

This should return [20] – [22] from the string you`ve specified.

Upvotes: 2

Related Questions