Reputation: 823
I am very new to python scripting but I have a very simple task that I would like to perform, but I seem to be stuck at it. All I am trying to accomplish is to read data from a .txt file and parse it.
Steps I have taken
sjsuclassdata.txt: text/plain; charset=unknown-8bit
Error Message that I got
Traceback (most recent call last):
File "/Users/edward/MyPythonScripts/sjsuClassExtractor.py", line 25, in <module>
regexMatches = lectureRegex.findall(file.read())
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 9: invalid continuation byte
So as you can see, I am really lost as to what Im supposed to do from here, I have verified that everything works if I read a different file that contains similar data.
Upvotes: 1
Views: 723
Reputation: 59184
Assuming that the original text file is ANSI encoded (default with Acrobat Reader's 'Save As Text' option), this command will convert it to utf-8
:
iconv -f "iso-8859-1" -t "utf-8" sjsuclassdata.txt -o sjsuclassdata-utf8.txt
Upvotes: 2