Reputation: 187
I'm trying to read usernames from a database and if there are non-UTF-8 characters, it throws UnicodeDecodeError.
I'm unsure of what all the non-UTF8 characters are and I'm looking for a solution.
I want to keep special symbols, but just filter out the ones that aren't compatible with UTF-8. ³
and ™
(trademark), don't work with UTF-8, they're the only two I know of.
I still want to keep chinese symbols, arabic, etc. That's why I'm using UTF8.
Code:
def is_author_used(author):
with open("C:\\Users\\Administrator\\Desktop\\authors.txt", 'r', encoding='utf-8') as f:
content = f.read().splitlines()
if author in content:
return True
return False
def set_author_used(author):
with open("C:\\Users\\Administrator\\Desktop\\authors.txt", 'a', encoding='utf-8') as f:
f.write(author + '\r\n')
Upvotes: 1
Views: 6919
Reputation: 30453
Maybe something like this:
with open('text.txt', encoding='utf-8', errors='ignore') as f:
content = f.read().splitlines()
Upvotes: 3