Hank Smith
Hank Smith

Reputation: 3

Unicode characters and regex

I am trying to run the following command in Python:

data = "&city=Zayas de Báscones;Zayas de Báscones;"
arr = re.findall(ur'[&]{1}\w{4}=[a-zA-ZA-Za-z£€ßçÇáàâäæãåèéêëîïíìôöòóøõûüùúÿñÁÀÂÄÆÃÅÈÉÊËÎÏÍÌÔÖÒÓØÕÛÜÙÚŸÑðÐ]+(?:[\s-][a-zA-ZA-Za-z£€ßçÇáàâäæãåèéêëîïíìôöòóøõûüùúÿñÁÀÂÄÆÃÅÈÉÊËÎÏÍÌÔÖÒÓØÕÛÜÙÚŸÑðÐ]+)*',data)
x = "".join(arr)
x = x.split('&city=')
print x

The result:

['', 'Zayas de B?scones']

How can I get the unicode character instead of the question mark ? I have been trying to use the regex pattern with a 'u' character at the start of the string (e.g: u'pattern') and also 'ur' before the patttern.

Upvotes: 0

Views: 111

Answers (1)

user1301404
user1301404

Reputation:

If you try to print x[1]:

 print x[1]
 #output: Zayas de B?

Now if you treat your data string as unicode.

data = u"&city=Zayas de Báscones;Zayas de Báscones;" # set it as unicode

If you try to print x[1]:

print x[1]
#output: Zayas de Báscones

Upvotes: 1

Related Questions