user28146142
user28146142

Reputation: 21

vector store stuck with file counts in_progress or vector store is empty

I am trying to upload 2 json files into an assistants vector store using the official openAI python library. I also want to use a specific chunking strategy, and a different one for each files. There are 2 approaches i found to be possible to upload files into a vector store. The first one would be to create a vector store, then upload the files to it using the client.beta.vector_stores.files functions. The second one would be to upload the files first using client.files.create, then create a vector store and attach the previously uploaded files.

However, both approaches dont work for me. The first one executes without exceptions, but the vector store is empty, containing 0 files. The second one also executes without exceptions but leaves the vector stores file_count at in_progress = 2 and remains stuck there.

Why is this happening?

i am using this code to create a vector store, then upload files to it with specific chunking strategies. The result is that the vector store is empty, no exception thrown.

vector_store = client.beta.vector_stores.create(
    name="human labeled dataset",
)

client.beta.vector_stores.files.upload_and_poll(
    vector_store_id=vector_store.id,
    file=open("results/results_tsm_human_labeled.json", "rb"),
    poll_interval_ms=1000,
    chunking_strategy={
        "type": "static",
        "static": {"max_chunk_size_tokens": 100, "chunk_overlap_tokens": 5},
    },
)

client.beta.vector_stores.files.upload_and_poll(
    vector_store_id=vector_store.id,
    file=open("data/sample_tsm_new.json", "rb"),
    poll_interval_ms=1000,
    chunking_strategy={
        "type": "static",
        "static": {"max_chunk_size_tokens": 1000, "chunk_overlap_tokens": 400},
    },
)

removing the chunking strategy parts has no effect. to debug, i tried it with client.files functions, without chunking strategy:

human_dataset_result_json_file = client.files.create(
    file=open("results/results_tsm_human_labeled.json", "rb"), purpose="assistants"
)
human_dataset_json_file = client.files.create(
    file=open("data/sample_tsm_new.json", "rb"), purpose="assistants"
)
vector_store = client.beta.vector_stores.create(
    name="human labeled dataset",
    file_ids=[human_dataset_result_json_file.id, human_dataset_json_file.id],
)

this results in this vector stores file count being stuck at in_progress = 2. So the 2 files are stuck at being processed, but never finished.

Interestingly, uploading the files into a vector store via Web UI works just fine.

Upvotes: 2

Views: 68

Answers (1)

rafayaar
rafayaar

Reputation: 79

There are 2 ways a JSON can be valid.
1-

    {
       "key":"value"
    }

2-

    {
       key:"value"
    }

Try formatting your json to either way, and then try again. You can use linting libraries to format json properly Most probably way 2 will succeed

Upvotes: 0

Related Questions