Reputation: 11
I am working on a system that processes uploaded PDFs, extracts text, generates FAISS and Pickle files, and stores them in Google Cloud Storage (GCS). After processing, I attempt to load the FAISS and Pickle files back from GCS for further operations. However, I'm encountering an error during the unpickling process of the Pickle file, specifically related to the 'fields_set' attribute.
Error: Error in downloading and initializing FAISS and Pickle files: Error loading Pickle file from local file: 'fields_set'
previously i was getting this error: Error in faiss::FileIOReader::FileIOReader(const char *) at /Users/runner/work/faiss-wheels/faiss-wheels/faiss/faiss/impl/io.cpp:68: Error: 'f' failed: could not open / download_intialize_from_gcs vector_store
Things that I tried so far to deal with this:
Upvotes: 0
Views: 214
Reputation: 11
This error generally occurs when the FAISS index’s associated data (often a pickle‐serialized Pydantic model, such as part of a “docstore” or an “index_to_docstore_id” mapping) is loaded in an environment where the internal Pydantic attributes differ from those when the file was saved. In your case, the unpickling process can’t find the expected attribute (namely, fields_set) on one of these objects, which typically happens because:
• There is a version mismatch of Pydantic (or a related library such as LangChain) between the environment that created the FAISS database and the one you’re using to load it. Pydantic’s internal attributes (like fields_set) can change between releases, so if you saved the index under one version and then try to load it under a different version, the expected attribute might be missing.
• The pickle file may have been serialized with assumptions about object state (for example, including internal fields) that no longer hold in the current version of the model. Sometimes developers work around this by “monkey‐patching” the model’s setstate method so that it removes or adds the missing attribute during unpickling.
For example, one workaround seen in similar cases was to patch the Document class (or whichever Pydantic model is being unpickled) so that in its setstate method it removes fields_set from the state before updating the instance’s dict. This isn’t necessarily optimal for production, but it can be a useful temporary fix if you trust the source of your pickle file.
In short, the error “'fields_set'” is raised because the object being unpickled does not have the fields_set attribute that your current Pydantic version expects. The solution is to ensure that the same versions of all relevant libraries (Pydantic, LangChain, etc.) are used for both serialization (saving) and deserialization (loading) of your FAISS index. If that isn’t possible, you might need to patch the unpickling process as described.
Upvotes: 1