Reputation: 151
I have a file with a lot of entries about Nobel prizes. I than convert that file into a list like this:
file = open(path, 'r')
file.readline()
content = []
for line in file:
line = line.replace('\n', '')
content.append(line.split(';'))
content = check(content, 'röntgen')
After that I have a function that takes that list and a other argument and checks if the list contains that argument. However if the argument takes a special character like the Ö it doen’t work because when the file is read python saves it like: ö
def check(content, attr):
reducedList = []
for i in range(len(content)):
curr = content[i][4]
if curr.find(attr) != -1:
reducedList.append(content[i])
return reducedList
with:
curr = 'voor hun verdiensten op het gebied van de analyse van de kristalstructuur door middel van röntgenstraling'
attr = 'röntgen'
I have tried converting it with utf-8 but that doesn’t seem to help. Does anyone have a solution?
Upvotes: 1
Views: 476
Reputation: 151
The solution is to replace open(path,’r’,)
with open(path,’r’,encodeing=’utf-8’)
If you add de encodeing parameter python will make sure de file is read in utf-8 so when you compare the strings they are truly the same.
Upvotes: 0
Reputation: 110301
This happens because you are using Python 2, likely on Windows, and your file is encoded in utf-8, not latin-1.
The best thng you do, instead of trying to randomly fix it (including with the first comments to your question: they are all random suggestions,), is to understand what is going on. So, stop what you are trying to do.
Then, switch to Python3 if you can - that should handle most issues automatically.
If you can't you have to proper deal with the text decoding and re-encoding manually - the concepts are on the link above. Assume your input files are in utf-8
Upvotes: 1