satran
satran

Reputation: 1232

Optimal way to find elements from one collection not in another

I have two collections. One a list of items with id and content, lets call this list ItemList. I have another collection which tells me whether a user has picked an item. Calling this list Collected it will have the user id and item id. Both the number of users and items are really large. What is the optimal way to query for items from ItemList for a user which are not in list Collected.

Here are a few ideas I have:

  1. Use joins of a relational database to solve this. My only query is will this handle really large dataset.
  2. Use blooms filter to store the collected item list and while querying for items check if it is not in the filter.

If the above thought will not scale could you provide me with algorithms that would. These can't be in-memory solutions as I would definitely require to persist data.

Upvotes: 0

Views: 44

Answers (1)

olegarch
olegarch

Reputation: 3891

You can use bitsets, also.

Load into some bitset ItemList (ItemID is index in the bitset).
Load into another bitset IDs_for_user for this user.
Perform opearation: Resut = ItemList ANDNOT IDs_for_user. 

Free bitset library you can get here: http://bmagic.sourceforge.net/

Upvotes: 1

Related Questions