Jonathan Livni
Jonathan Livni

Reputation: 107102

Using end of word mark with unicode in regular expressions in Python

The following matches in Idle, but does not match when run in a method in a module file:

import re
re.search('\\bשלום\\b','שלום עולם',re.UNICODE)

while the following matches in both cases:

import re
re.search('שלום','שלום עולם',re.UNICODE)

(Notice that stackoverflow erroneously switches the first and second items in the line above as this is a right to left language)

How can I make the first code match inside a py file?

Update: What I should have written for the first segment is that it matches in Idle, but does not match when run in eclipse console with PyDev.

Upvotes: 3

Views: 280

Answers (1)

Kobi
Kobi

Reputation: 138027

Seems to work for me when I'm using unicode strings:

# -*- coding: utf-8 -*-

import re
match = re.search(u'\\bשלום\\b', u'שלום עולם', re.U)

See it in action: http://codepad.org/xWz5cZj5

Upvotes: 2

Related Questions