Logan McNulty
Logan McNulty

Reputation: 83

Atlas Search Index Build Fail

I am working with a large dataset (several thousand documents) and I am trying to construct an Atlas Lucene search index for a particular field in these documents. To give an idea of my data, here's a simplified version of my documents:

{
    name:'XYZ',
    lastUpdated: date 1,
    fundamentalData:{
        description: stuff,
        latUpdatedFA: date 2,
        ...a lot more data
    },
    performanceData:[a lot of nested objects],
    otherPerformanceData:[more nested objects],
    ... more descriptive data
}

The issue arises when I attempt to form a straightforward search index on the fundamentalData.description field. The system constantly returns a failure message stating:

'Your index could not be built: Unexpected error: DocValuesField "$type:date/lastUpdated" appears more than once in this document (only one value is allowed per field)'

This error suggests that the 'lastUpdated' field is duplicated in a document. However, I've verified using Python that this isn't the case. (see code at the end)

As a side note, I have a field fundamentalData.lastUpdatedFA which is structurally similar to lastUpdated, but I've confirmed that this should not be an issue as long as the names are not identical. I even performed an updateMany, changing the name of that field to something completely different. No luck

Interestingly, when I build the search index in a conventional way with db.collection.createIndex( { fundamentalData.description: "text" } ), everything operates as expected. I'm aware that the Atlas Search algorithm differs significantly from the legacy createIndex method, but I'm not sure how it's affecting my case here.

I would appreciate any insights or suggestions. Thanks!

Logan

def find_duplicate_lastUpdated(collection):
    duplicate_lastUpdated_docs = []
    for doc in collection.find():
        lastUpdated_count = str(doc).count("'lastUpdated'")
        if lastUpdated_count > 1:
            duplicate_lastUpdated_docs.append(doc['name'])
        time.sleep(0.01)  # sleep for 10 milliseconds
    return duplicate_lastUpdated_docs

collection = db["assetdatas"]
duplicates = find_duplicate_lastUpdated(collection)
len(duplicates)

# for name in duplicates:
#     print(name)

Upvotes: 0

Views: 190

Answers (0)

Related Questions