Reputation: 131
Let's take 2 string arraylists
List<String> namesListA = new ArrayList<>(/*50 000 strings*/);
List<String> namesListB = new ArrayList<>(/*400 000 strings*/);
removeAll method seems not working. After:
namesListA.removeAll(namesListB);
namesListA.size() is still 50000. Edit: Input data was incorrect, it actually works but takes a long lime.
I wrote the following brute-force code:
boolean match;
for (String stringA: namesListA)
{
match = false;
for (String stringB: namesListB)
{
if (stringA.equals(stringB))
{
match = true;
break;
}
}
if (!match)
{
finallist.add(stringA);
}
}
But it takes 8 hours to perform. it there any known effective algorithm for searching strings? Like to sort strings in alphabetical order and then search letter by letter or something like this.
Upvotes: 1
Views: 1083
Reputation: 4825
You could put elements of list namesListB
into a new Set
(preferably HashSet
). Then it is much more effective to call namesListA.removeAll(setFromListB);
, since the implementation of ArrayList.removeAll
calls Collection.contains()
which is much more effective in a Set
(HashSet
) than in an ArrayList
(HashSet.contains()
has constant time performance, while ArrayList.contains()
has linear performance).
Anyway, namesListA.removeAll(namesListB);
should work, if namesListA
doesn't change, then the 2 lists have no elements in common.
Estimation of time complexity (N = namesListA.length
, M = namesListB.length
):
Creating the HashSet
from namesListB
: O(M)
Calling namesListA.removeAll(setListB)
: O(N * 1) = O(N)
In total: O(M + N) (which could be written as O(M) since M>N, but I'm not sure)
Upvotes: 6
Reputation: 1455
If it's not important which element is duplicate but only if there is any you can let the Collections do the trick for you.
int sizeA = listA.size();
int sizeB = listB.size();
Set merger = new HashSet((sizeA+sizeB)*someLoadFactor);
merger.addAll(listA);
merger.addAll(listB);
// Sets do not contain duplicates!
if (merger.size() < sizeA + sizeB){
return true;
}
return false;
This runs in O(i+j) so efficiently O(n)!
Upvotes: 0
Reputation: 1455
Here's a solution in O(n*logn). Should be faster than the approaches posted yet. Edit: If you don't need the exact element, my other approach is faster.
1.) Sort both lists
Use Collections.sort(...) for efficient sorting in O(n*logn).
2.) Compare with two iterators
Fetch two iterators over the two lists. Then:
while(leftIterator.hasNext() && rightIterator.hasNext(){
int comparisonResult = leftElement.compare(rightElement);
if (comparisonResult == -1){
leftElement = leftIterator.next();
}
else if (comparisonResult == 1){
rightElement = rightIterator.next();
}
else{
// found it!
return true;
}
}
(Sorry if I mistyped, don't have an IDE at my hand)
=> Sorting is in O(ilogi + jlogj))
=> Comparison is in O(i+j)
Result performance is efficiently in class O(n*logn). This should work nicely.
Upvotes: 1
Reputation: 2718
Doing a list of removeAll is probably a better solution, Considering you have both the variables with 50k and 400k sized Lists
namesListA.removeAll(namesListB);
Upvotes: 0
Reputation: 45005
I would recommend to use an HashSet
instead of a List
to store the String
s of the biggest collection in order to know whether the collection contains or not a given String
with a time complexity of O(1)
instead of O(n)
, then use removeAll(Collection<?> c)
to keep only the String
s that are not in the second collection as next:
List<String> namesListA = new ArrayList<>(/*50 000 strings*/);
Set<String> namesSetB = new HashSet<>(/*400 000 strings*/);
namesListA.removeAll(namesSetB);
Upvotes: 1
Reputation: 17605
One possibility would be to parallelize the removal. The lists namesListA
and namesListB
can be grouped by starting character; then the removal could be done group-wise in parallel and the resulting lists could be concatenated again.
Assuming some standard Latin alphabet, this would result in roughly 26 groups which could be processed in parallel. If 4 threads can be run in parallel, I would expect a significant speedup.
Upvotes: 1
Reputation: 5562
Create a set for the 400 000 names in namesListB
. Then use this set to remove the undesired elements of namesListA
.
List<String> namesListA = new ArrayList<>(/*50 000 strings*/);
List<String> namesListB = new ArrayList<>(/*400 000 strings*/);
Set<String> undesiredNames = new HashSet<>(namesListB);
for (String name : namesListA) {
if (undesiredNames.contains(name)) {
namesListA.remove(name);
}
}
Upvotes: 2