Reputation: 300
when I read like this, some files
list_of_files = glob.glob('./*.txt') # create the list of files
for file_name in list_of_files:
FI = open(file_name, 'r', encoding='cp1252')
Error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1260: character maps to
When I switch to this
list_of_files = glob.glob('./*.txt') # create the list of files
for file_name in list_of_files:
FI = open(file_name, 'r', encoding="utf-8")
Error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1459: invalid start byte
And I have read that I should open this as a binary file. But I'm not sure how to do this. Here is my function:
def readingAndAddToList():
list_of_files = glob.glob('./*.txt') # create the list of files
for file_name in list_of_files:
FI = open(file_name, 'r', encoding="utf-8")
stext = textProcessing(FI.read())# split returns a list of words delimited by sequences of whitespace (including tabs, newlines, etc, like re's \s)
secondaryWord_list = stext.split()
word_list.extend(secondaryWord_list) # Add words to main list
print("Lungimea fisierului ",FI.name," este de", len(secondaryWord_list), "caractere")
sortingAndNumberOfApparitions(secondaryWord_list)
FI.close()
Just the beggining of my functions matter because I get the error at the reading part
Upvotes: 2
Views: 1986
Reputation: 62
If you are on windows,open the file in NotePad and save as desired encoding . In Linux , DO the same in text editor. hope your program runs.
Upvotes: 1