Reputation: 3642
I have a list of objects say, List. The Entity class has an equals method,on few attributes ( business rule ) to differentiate one Entity object from the other.
The task that we usually carry out on this list is to remove all the duplicates something like this :
List<Entity> noDuplicates = new ArrayList<Entity>();
for(Entity entity: lstEntities)
{
int indexOf = noDuplicates.indexOf(entity);
if(indexOf >= 0 )
{
noDuplicates.get(indexOf).merge(entity);
}
else
{
noDuplicates.add(entity);
}
}
Now, the problem that I have been observing is that this part of the code, is slowing down considerably as soon as the list has objects more than 10000.I understand arraylist is doing a o(N) search.
Is there a faster alternative, using HashMap is not an option, because the entity's uniqueness is built upon 4 of its attributes together, it would be tedious to put in the key itself into the map ? will sorted set help in faster querying ?
Thanks
Upvotes: 5
Views: 631
Reputation: 20442
Two simple steps for an O(N*Log(N)) algorithm:
Upvotes: 0
Reputation: 9008
Unless you have a reason for needing the ordering of a List, you'd probably be best off with a Set - specifically, a HashSet.
I see your concern about using a hashed collection because "the entity's uniqueness is built upon 4 of its attributes together", but that's easily overcome. You just have to define a hashcode() method which is compatible with your existing equals() method, and then you can insert your entities into a Set, and as a magic side effect, never have to remove duplicates again.
Upvotes: 0
Reputation: 139921
Now, the problem that I have been observing is that this part of the code, is slowing down considerably as soon as the list has objects more than 10000.I understand arraylist is doing a o(N) search.
The algorithm you posted is actually worse than O(N)
lstEntities
- O(N)ArrayList.indexOf(T)
which has to scan the list - O(N) againYou algorithm is actually O(N^2) since you are potentially scanning the list twice within a loop.
It sounds like you what you want to do is actually two operations:
List
, remove any duplicatesYou can do this by scanning the list just once, rather than in nested loops. I would recommend breaking up your Entity
to move the fields that "identify" an Entity into another type, such as ID
, or at the very least add a getID()
method which can return these fields grouped into a single type. This way you can easily build a Map between the two types to be able to merge entities with "duplicate" identities. This might look something like this:
Map<ID, Entity> map = new HashMap<ID, Entity>(inputList.size());
for (Entity e : inputList) {
Entity existing = map.get(e.getID());
if (existing == null) {
//not in map, add it
map.put(e.getID(), e);
}
else {
existing.merge(e);
}
}
Iterating through the list is O(n) while HashMap.get(K)
is a constant-time operation.
Upvotes: 2
Reputation: 8771
An idea is to use a Set
instead of a List
, there are no duplicates in a Set
. To remove duplicates from a list, you could just add the List
to a new Set
List<Entity> list = //your list.
Set<Entity> set = new HashSet<Entitiy>();
set.addAll(list);
But then again, maybe there is some reason for using a List
in the first place? If not, you could use a Set
instead, and not have to worry about any duplicates.
EDIT
There is no index reference of the elements in a Set
(as compared to a List
, where you can do get(int index)
). The elements in a Set
are floating around without a specific point of reference.
If you need to find a specific one, you need to iterate through them all. If that is not ok and/or you can't be without the indexed reference - that allows for get(int index)
and remove(int index)
- I guess Set
is not an option for you.
Upvotes: 2
Reputation: 23548
It all depends on what that merge
operation is doing. Does merge
change any of the attributes that are compared when you do equals
? If not, then you'll be amazed at how much faster it will be if you do this:
First, define a hashCode
for your Entity
class that is compatible with your definition of equals
. One common way to do this is:
public int hashCode() {
// assuming the four attributes that determine equality are called
// attrFoo, attrBar, attrBaz, and attrQux
int hash = 1;
hash += attrFoo == null ? 0 : attrFoo.hashCode();
hash *= 37;
hash += attrBar == null ? 0 : attrBar.hashCode();
hash *= 37;
hash += attrBaz == null ? 0 : attrBaz.hashCode();
hash *= 37;
hash += attrQux == null ? 0 : attrQux.hashCode();
return hash;
}
Then, use a HashMap
so that you can find these things:
Map<Entity, Entity> map = new HashMap<Entity, Entity>();
for(Entity entity: lstEntities) {
if (map.containsKey(entity)) {
map.get(entity).merge(entity);
} else {
map.put(entity, entity);
}
}
return map.values(); // or keys(). Whichever.
I should note that I feel a bit dirty writing the above code, because you really shouldn't make Map
keys that aren't immutable, but this will work and be much, much faster than what you're doing now.
Upvotes: 1
Reputation:
Instead of a list structure, you could use a set (more appropriate if you are concerned about Entity uniqueness), as Lars has suggested. Additionally, if performance is a problem, I'd look at using a TreeSet and implement a Comparator for comparing Entity instances based on their attributes. The tree structure will allow for fast (logarithmic complexity) insert, delete and retrieval operations.
Upvotes: 3