maq
maq

Reputation: 1236

How to check if string is 100% ascii in python 3

i have two strings

eng = "Clash of Clans – Android Apps on Google Play"
rus = "Castle Clash: Новая Эра - Android Apps on Google Play"

and now i want to check whether string is in English or not by using Python 3.

I have read this Stackoverflow answer here and it does not help me as its for Python 2.x solution but in comments some one mention that use

string.encode('ascii')

to make it work in Python 3.x but my problem is, in both cases it raises same UnicodeEncodeError exception!

Screenshot: enter image description here

so now i am stuck here and cant figure out how to make it work! kindly guide me or i have to use another method to determine if String is in English or not! Thanks

Upvotes: 3

Views: 5669

Answers (3)

Nishant
Nishant

Reputation: 31

You can use the isascii() method:

>>> rus.isascii()
False

Upvotes: 3

Mark Ransom
Mark Ransom

Reputation: 308432

Your English string really isn't true ASCII, it contains the character U+2013 - EN DASH. This looks very similar to the ASCII dash U+002d but it is different.

If this is the only character you need to worry about, you can do a simple replacement to make it work:

>>> eng.replace('\u2013', '-').encode('ascii')
b'Clash of Clans - Android Apps on Google Play'

Upvotes: 3

Hayley Guillou
Hayley Guillou

Reputation: 3973

As with Salvador Dali's answer you linked to, you must use a try-catch block to check for an error in encoding.

# -*- coding: utf-8 -*-
def isEnglish(s):
    try:
        s.encode('ascii')
    except UnicodeEncodeError:
        return False
    else:
        return True

Just to note though, when I copy and pasted your eng and rus strings to try them, they both came up as False. Retyping the English one returned True, so I'm not sure what's up with that.

Upvotes: 5

Related Questions