Vito Piepoli
Vito Piepoli

Reputation: 57

Whoosh locked statement while writing index

I a trying to build an index of parliamentary transcripts that I have already split by page number and converted in txt. I am using the last version of whoosh to build it up but it gives me lock error. This is the code:

import os
from whoosh import index
from whoosh.index import create_in
from whoosh.fields import Schema, TEXT, ID
import sys

def createSearchableData(root):   

    '''
    Schema definition: title(name of file), path(as ID), content(indexed
    but not stored),textdata (stored text content)
    '''
    schema = Schema(title=TEXT(stored=True),path=ID(stored=True),\
              content=TEXT,textdata=TEXT(stored=True))
    if not os.path.exists("indexdir"):
        os.mkdir("indexdir")

    # Creating a index writer to add document as per schema
    ix = index.create_in("indexdir",schema)
    writer = ix.writer()

    filepaths = [os.path.join(root,i) for i in os.listdir(root)]
    for path in filepaths:
        fp = open(path,'rb')
        print(path)
        text = fp.read()
        writer.add_document(title=path.split("\\")[1], path=path,\
          content=text,textdata=text)
        fp.close()
    writer.commit()

root = "C:\\Users\\vitop\\OneDrive\\Desktop\\Final Project\\Test\\Splitted\\Txt"
createSearchableData(root)

This is the error noted:

LockError                                 Traceback (most recent call last)
<ipython-input-4-e8b4a33a2859> in <module>
     31 
     32 root = "C:\\Users\\vitop\\OneDrive\\Desktop\\Final Project\\Test\\Splitted\\Txt"
---> 33 createSearchableData(root)

<ipython-input-4-e8b4a33a2859> in createSearchableData(root)
     18     # Creating a index writer to add document as per schema
     19     ix = index.create_in("indexdir",schema)
---> 20     writer = ix.writer()
     21 
     22     filepaths = [os.path.join(root,i) for i in os.listdir(root)]

~\Anaconda3\lib\site-packages\whoosh\index.py in writer(self, procs, **kwargs)
    462         else:
    463             from whoosh.writing import SegmentWriter
--> 464             return SegmentWriter(self, **kwargs)
    465 
    466     def lock(self, name):

~\Anaconda3\lib\site-packages\whoosh\writing.py in __init__(self, ix, poolclass, timeout, delay, _lk, limitmb, docbase, codec, compound, **kwargs)
    513             if not try_for(self.writelock.acquire, timeout=timeout,
    514                            delay=delay):
--> 515                 raise LockError
    516 
    517         if codec is None:

LockError: 

The code created the folder and set up the index file. I tried to use the workaround suggest with Whoosh documentation but it does not work.

Upvotes: 2

Views: 1760

Answers (2)

CJay
CJay

Reputation: 36

I have not read your code, but I was faceing the same LockError. I believe it is because I ran writer = ix.writer() and some error occurred because of other issues, in this case the last line which is writer.commit() did not run.

Solution: if using Jupyter notebook just create new code block and run writer.commit() this will release the writer.

Upvotes: 1

Vito Piepoli
Vito Piepoli

Reputation: 57

Fixed with getting rid of binary option and encoding "utf-8" the txt file.

for path in filepaths:
    fp = open(path,'r', encoding = "utf-8") 
    print(path)
    text = fp.read()
    writer.add_document(title=path.split("\\")[1], path=path,
    content=text,textdata=text)
    fp.close()
writer.commit()

Upvotes: 1

Related Questions