Robert Corvus
Robert Corvus

Reputation: 2136

Microsoft GraphRAG Indexer can't import txt file from website

When using Microsoft GraphRAG and downloading a book's .txt file from the Gutenberg website and importing it into the GraphRAG indexer I get the following error:

File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
⠋ GraphRAG Indexer

Part of the problem was that the instructions on the Microsoft tutorial just saved the 200 OK output of the call rather than the file itself, but even after downloading it manually and importing it from my local harddrive, I'm still getting the error.

Upvotes: 0

Views: 199

Answers (1)

Robert Corvus
Robert Corvus

Reputation: 2136

The quick solution was to manually download the 'A Christmas Carol' book in my web browser from https://www.gutenberg.org/cache/epub/24022/pg24022.txt to my local harddrive then copy/pasted into Notepad and saved as UTF-8 encoding.

Choose UTF-8 Encoding

NOTE: Notepad++ did not save the txt file as UTF-8 that the indexer recognized, and running the indexer would still throw the error.
Here's a working solution including further troubleshooting tips: https://github.com/rcorvus/MicrosoftGraphRAG

Upvotes: 0

Related Questions