Reputation: 31782
I'm using Core Data to store a lot (1000s) of items. A pair of properties on each item are used to determine uniqueness, so when a new item comes in, I compare it against the existing items before inserting it. Since the incoming data is in the form of an RSS feed, there are often many duplicates, and the cost of the uniquing step is O(N^2), which has become significant.
Right now, I create a set of existing items before iterating over the list of (possible) new items. My theory is that on the first iteration, all the items will be faulted in, and assuming we aren't pressed for memory, most of those items will remain resident over the course of the iteration.
I see my options thusly:
Is option 3 likely to be faster than my current approach? Do you know of a better way?
Upvotes: 1
Views: 566
Reputation: 1069
The third step of ohhorob's proposed solution is probably most efficently implemented as described in the Core Data docs in the section 'Implementing Find-or-Create Efficiently'. That is, sorting both the incoming items and their correspondant existing items after the hash property, then looping over the two collections in parallell.
Upvotes: 1
Reputation: 11805
As per Alex's answer, a predicate on an integer property should be quicker, but the strategy should be adjusted to better suit the task:
collect a list of all incoming item hashes
fetch all objects matching that hash list (only fetch the hash property)
iterate over the incoming items, skipping those that have their hash in the fetched matches
Additionally you can fetch a dictionary result to avoid having the managed objects set up that you will not use (unless you intend on updating the existing objects instead of just skipping the identical incoming items)
Upvotes: 0
Reputation: 96927
A comparison of integer hash values — generate the hash of the components of an incoming RSS news item, use a predicate to filter all items with the same hash attribute value, and see if any object is fetched — may yield better performance than string comparisons of all the attributes within every entity.
Upvotes: 2