Reputation: 77
I am currently trying to get the tweets of an account and write them in a specific format into a file, but sometimes the account uses emojis and other characters outside the codec, so when reading tweets, Python freaks out and gives me the following error (The specific character it doesn't like is the greek letter "χ", if that helps in any way, although I need a fix that could work with any character that Python dislikes):
UnicodeEncodeError: 'charmap' codec can't encode character '\u03c7' in position 4: character maps to <undefined>
I tried adding .encode("utf-8")
to the end of the String, but that ends up writing the raw text data to the file, when I actually need the words to write to different lines. Here's the code I have so far (The code itself works, as in it reads the data and puts it into the format I need, so I don't need help on that, just the writing to file part.):
with open("LSData.txt", "a") as file:
for status in tl:
wordList = status.full_text.split(" ")
for word in wordList:
try:
if("http" not in word):
if(word == wordList[0] or
wordList[wordNum-1][len(wordList[wordNum-1])-1] == "." or
wordList[wordNum-1][len(wordList[wordNum-1])-1] == "!" or
wordList[wordNum-1][len(wordList[wordNum-1])-1] == "?"):
wordsToAdd = "-" + word + " " + wordList[wordNum+1] + "\n"
file.write(wordsToAdd)
else:
wordsToAdd = word + " " + wordList[wordNum+1] + "\n"
file.write(wordsToAdd)
except(IndexError):
pass
wordNum += 1
If I need to provide more info, let me know. Thanks in advance!
Upvotes: 0
Views: 4369
Reputation: 6217
The short answer:
You need to open the file with the UTF-8 encoding.
with open("LSData.txt", "a", encoding="utf-8") as file:
The long answer:
The error you are seeing is generated when Python tries to write a character to a file, but the encoding you opened the file with doesn't support that character.
In your code above you don't specify an encoding when you use open
, and in this case Python will use the default encoding for your locale. This varies by system, and it looks like the default locale for your system doesn't support the Greek letter "χ".
To fix this, you need to specify an encoding when you open the file, and the encoding that you use needs to support all of the characters that you want to write. This usually means using one of the Unicode encodings, and the most common Unicode encoding is UTF-8. If you have a choice about what encoding should be used, then these days it is best practice to use UTF-8 whenever you open a file.
If you want to read more into the fascinating details of how encodings work and why problems like this happen, this blog post is a good place to start.
Upvotes: 5