Reputation: 83
I am working with a large dataset (several thousand documents) and I am trying to construct an Atlas Lucene search index for a particular field in these documents. To give an idea of my data, here's a simplified version of my documents:
{
name:'XYZ',
lastUpdated: date 1,
fundamentalData:{
description: stuff,
latUpdatedFA: date 2,
...a lot more data
},
performanceData:[a lot of nested objects],
otherPerformanceData:[more nested objects],
... more descriptive data
}
The issue arises when I attempt to form a straightforward search index on the fundamentalData.description field. The system constantly returns a failure message stating:
'Your index could not be built: Unexpected error: DocValuesField "$type:date/lastUpdated" appears more than once in this document (only one value is allowed per field)'
This error suggests that the 'lastUpdated' field is duplicated in a document. However, I've verified using Python that this isn't the case. (see code at the end)
As a side note, I have a field fundamentalData.lastUpdatedFA which is structurally similar to lastUpdated, but I've confirmed that this should not be an issue as long as the names are not identical. I even performed an updateMany, changing the name of that field to something completely different. No luck
Interestingly, when I build the search index in a conventional way with db.collection.createIndex( { fundamentalData.description: "text" } )
, everything operates as expected. I'm aware that the Atlas Search algorithm differs significantly from the legacy createIndex method, but I'm not sure how it's affecting my case here.
I would appreciate any insights or suggestions. Thanks!
Logan
def find_duplicate_lastUpdated(collection):
duplicate_lastUpdated_docs = []
for doc in collection.find():
lastUpdated_count = str(doc).count("'lastUpdated'")
if lastUpdated_count > 1:
duplicate_lastUpdated_docs.append(doc['name'])
time.sleep(0.01) # sleep for 10 milliseconds
return duplicate_lastUpdated_docs
collection = db["assetdatas"]
duplicates = find_duplicate_lastUpdated(collection)
len(duplicates)
# for name in duplicates:
# print(name)
Upvotes: 0
Views: 190