Hide
Hide

Reputation: 3317

pymongo huge collection return with list

You know that there is no join in mongodb, So I execute likes join query like this.

users = user_collection.find({"region": "US", `and some condition here`}, projection={"user_id": 1"})
user_list = [
    user['user_id']
    for user in users
]
posts = post_collection.find({"user_id": {"$in": user_list}, `and some condition here`)

(To avoid bring unnecessary field, also used projection option in find())

Collection and list size

users = 2000000

user_list = 100000

posts = 2000000

When I execute query, it takes almost 4 seconds.

Among them, make user_list takes almost 3 seconds.

Question

  1. How can I make a result to list only contains user_id efficiently?
  2. Any way to improve performance here?

Thanks.

Upvotes: 1

Views: 76

Answers (1)

felix
felix

Reputation: 9285

First, make sure that the fields you query on are properly indexed. If it's already done, you can try this:

1. use distinct()

you could use distinct to get the user_list in one single query:

something like this:

user_list = user_collection.distinct("user_id", {"region": "US", ...})

2. Aggregation with a $lookup

second option is to retrieve the posts in a single query by performing a $lookup from the user_collection:

user_collection.aggregate([
  {
    "$match": {"region": "US", ...}
  }, 
  {
    "$lookup": {
        "from": "post_collection",
        "localField": "user_id",
        "foreignField": "user_id",
        "as": "post"
    }
  }, 
  ...
])

and then filter the posts with a $unwind and a $match stage

Upvotes: 1

Related Questions