Pankus
Pankus

Reputation: 1417

Find a unicode character in string with Python

I'm new to python and maybe this question is not so smart, but anyway I cannot solve this small issue. As usual, for instance in a conditional statement, to find some a character or a substring in a string I'm used to write the following code:

if 'a' in myvariable:
    <do something>

However, if the character or substring are unicode characters with an high code-point, for instance a ⸣ (half square-bracket), I get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128).

I understand the problem, but I cannot figure it out how to solve it.

Of course I'm working with python 2.7

EDIT

This is my true iteration and some clarifications follow:

if '⸣' not in myvariable:
    newvariable = 100.0

I have to test if '⸣' is not in myvariable: myvariable type is already <type 'unicode'>, whereas the unicode character '⸣' (Unicode Code Point U+2E23) is out of the range of ASCII characters. Moreover the scripts already make use of the pragma # -*- coding: utf-8 -*-.

Many thanks to all

Upvotes: 1

Views: 8678

Answers (4)

Mark Ransom
Mark Ransom

Reputation: 308148

This is why implicit conversion of byte strings to Unicode strings was removed in Python 3.

You're almost there, with the #coding line at the start of your file. Just one tiny change to turn your test character into a Unicode string:

if u'⸣' not in myvariable:
    newvariable = 100.0

You might have trouble with that particular character as I did on my system, so you can use the equivalent escape sequence instead:

if u'\u2e23' not in myvariable:
    newvariable = 100.0

Upvotes: 2

DEVV911
DEVV911

Reputation: 448

You can also try changing the file encoding type to make it work. Refer this doc: https://www.python.org/dev/peps/pep-0263/

You can change the file's encoding type to UTF-8 by adding this to your source file:

# -*- coding: utf-8 -*-

Example

# -*- coding: utf-8 -*-
b = '⸣fdsf'
if 'd' in b:
    print 'd'

Upvotes: 0

Rinsen S
Rinsen S

Reputation: 481

You can declare the unicode as eg: var = u'e' and do the following operation var.find('a') to find the character in the unicode variable.

Hope this works !!

Upvotes: 0

Solki
Solki

Reputation: 1

Work with python 3? 😃 I think you can import a module for text no?

Upvotes: -1

Related Questions