Reputation: 255
I want to update a Mongo db collection with rows from a CSV file and I'm getting an error relating to my use of distinct
. What I'm trying to do is delete a document if its id is found in the CSV file. Here's the code:
# Read in CSV file
df = pd.read_csv(csv_path)
# Create list of ID values in CSV
pdToList = list(df['id'])
# Get ids from CSV
for counter, value in enumerate(pdToList):
# find distinct ids in collection
result = db[coll_name].distinct("id")
# Loop through IDs in collection
for id_val in result:
# Check if ID in collection equals ID in CSV
if id_val == value:
# Delete document if it exists in CSV
db[coll_name].delete_one({'id':id_val})
else:
pass
This script runs fine for about 100 tables, but then I get an error message on distinct being too big:
{'ok': 0.0, 'errmsg': 'distinct too big, 16mb cap', 'code': 17217, 'codeName': 'Location17217'}
. Do you know how I can get a field value from a document without getting this 16mb cap error? I've read similar questions, but haven't come up with a solution.
Upvotes: 1
Views: 893
Reputation: 255
Here is what I did. It was better to find
values that were equal to the values in the dataframe list, than try to do another for loop, probably better for my computer too.
# Read in CSV file
df = pd.read_csv(csv_path)
# Create list of ID values in CSV
pdToList = list(df['id'])
# Get ids from CSV
for counter, value in enumerate(pdToList):
# check if value is in document
if db[coll_name].find({},{"id":{"$eq": value}}):
# Delete document if it exists in CSV
db[coll_name].delete_many({'id':value})
else:
pass
Upvotes: 1