Reputation: 155
Say I have class called MyClass as follow:
public class MyClass
{
//Identifier is alpha-numeric. If the identifier starts will 'ZZ'
//is special special identifier.
private String identifier = null;
//Date string format YYYY-MM-DD
private String dateString = null;
//Just a flag (not important for this scenario)
private boolean isCoolCat = false;
//Default Constructor and getters/setters implemented
//Overrides the standard Java equals() method.
//This way, when ArrayList calls contains() for MyClass objects
//it will only check the Date (for ZZ identifier)
//and identifier values against each other instead of
//also comparing the isCoolCat indicator value.
@Override
public boolean equals(Object obj)
{
if(this == obj)
{
return true;
}
if(obj == null)
{
return false;
}
if(getClass() != obj.getClass())
{
return false;
}
MyClass other = (MyClass) obj;
if(this.identifier == null)
{
if(other.identifier != null)
{
return false;
}
} else if(!this.identifier.equals(other.identifier)) {
return false;
}
if(other.identifier.startsWith("ZZ"))
{
if(!this.dateString.equals(other.dateString))
{
return false;
}
}
return true;
}
}
In another class I have two List of MyClass type, each contain 100,000 objects. I need to check if items in one list are in the other list and I currently accomplish this as follow:
`
List<MyClass> inList = new ArrayList<MyClass>();
List<MyClass> outList = new ArrayList<MyClass>();
inList = someMethodForIn();
outList = someMethodForOut();
//For loop iterates through inList and check if outList contains
//MyClass object from inList if it doesn't then it adds it.
for(MyClass inObj : inList)
{
if(!outList.contains(inObj))
{
outList.add(inObj);
}
}
My question is: Is this the fastest way to accomplish this? If not can you please show me a better implementation that will give me a performance boost? The list size is not always going to be 100,000. Currently on my platform it takes about 2 minutes for 100,000 size. Say it can vary from 1 to 1,000,000.
Upvotes: 5
Views: 6233
Reputation: 47078
You want to use a Set
for this. Set
has a contains
method which can determine if an object is in the set in O(1) time.
A couple things to watch out for when converting from List<MyClass>
to Set<MyClass>
:
MyClass
needs to implement hashcode()
and equals()
, and they should be consistent. To convert your List
to Set
you can just use:
Set<MyObject> s1 = new HashSet<>(inList);
Set<MyObject> s2 = new HashSet<>(outList);
This Java doc explains how to find the union, intersection, and difference of two sets. In particular, it seems like you're interested in the Union:
// transforms s2 into the union of s1 and s2. (The union of two sets
// is the set containing all of the elements contained in either set.)
s2.addAll(s1)
Upvotes: 5
Reputation: 61
2 minutes comparing 2 very large lists, probably not going to get much time savings here, so depending on your application, can you set a flag so that things dependant on this cannot run until finished and push this into it's own thread and let the user do something else (while also telling them this is on-going.) Or at least put up a progress bar. Letting the user know the app is busy and telling them (ish) how long it will take on something only taking a few minutes in a very complex computation like this is OK and probably better than just shaving a few seconds off the time. users are quite tolerant of delays if they know how long they will be and you tell them there is time to go get a coffee.
Upvotes: 0
Reputation: 1469
Hashing ! Hashing is always the answer !
Current complexity of this code is, O(nm)
where n
is the size of inList
and m
is the size of outList
.
You can use a HashSet
to reduce your complexity to O(n)
. Because contains
will now take O(1)
This can be done like this,
HashSet<MyClass> outSet = new HashSet<>(outList);
for(MyClass inObj : inList)
{
if(!outSet.contains(inObj))
{
outList.add(inObj);
}
}
Credits and Sources.
returning difference between two lists in java
Time complexity of contains(Object o), in an ArrayList of Objects
Upvotes: 0