Python-Whoosh BufferedWriter does not commit to the disk

Question

Here is example in which I try to index large collection with whoosh

schema = Schema(name=TEXT(stored=True), m=ID(stored=True), content=KEYWORD(stored=True))
ix = create_in("indexdir", schema)
from whoosh.writing import BufferedWriter
from multiprocessing import Pool
jobs = []

writer = BufferedWriter(ix, period=15, limit=512, writerargs = {"limitmb": 512})
for item in cursor:
    if len(jobs) < 1024:
        jobs.append(item)
    else:
        p = Pool(8)
        p.map(create_barrel, jobs)
        p.close()
        p.join()
        jobs = []
        writer.commit()

create_barrel function in the end does the following:

writer.add_document(name = name, m = item['_id'], content = " ".join(some_processed_data))

yet after a few hours of running the index is empty and the only file in the indexdir is lock file _MAIN_0.toc

The code above kind of works when I switch no AsyncWriter but for some reason AsyncWriter misses around 90% of commits and standard writer is too slow for me.

Why does BufferedWriter miss commits?

Python-Whoosh BufferedWriter does not commit to the disk

Answers (1)

Related Questions