Reputation: 2136
When using Microsoft GraphRAG and downloading a book's .txt file from the Gutenberg website and importing it into the GraphRAG indexer I get the following error:
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
⠋ GraphRAG Indexer
Part of the problem was that the instructions on the Microsoft tutorial just saved the 200 OK output of the call rather than the file itself, but even after downloading it manually and importing it from my local harddrive, I'm still getting the error.
Upvotes: 0
Views: 199
Reputation: 2136
The quick solution was to manually download the 'A Christmas Carol' book in my web browser from https://www.gutenberg.org/cache/epub/24022/pg24022.txt to my local harddrive then copy/pasted into Notepad and saved as UTF-8 encoding.
NOTE: Notepad++ did not save the txt file as UTF-8 that the indexer recognized, and running the indexer would still throw the error.
Here's a working solution including further troubleshooting tips: https://github.com/rcorvus/MicrosoftGraphRAG
Upvotes: 0