Reputation: 6538
I have two big arrays of strings. I want to remove the elements from the first array that do not exist in the second array. First I create two arrays:
Array to modify:
String[] sarr = fdata.split(System.getProperty("line.separator"));
ArrayList<String> items = new ArrayList(Arrays.asList(sarr));
Filter array:
List<String> filter = new ArrayList<String>();
filter = Arrays.asList(voc.split(System.getProperty("line.separator")))
Then I create Iterator to iterate through the elements of the items
array and check if the iterated item exists in filter
array, if it does, remove it from items
:
Iterator<String> it = items.iterator();
while (it.hasNext()) {
String s = it.next();
if (!filter.contains(s)) {
it.remove();
}
}
items
arrays contains 286,568 strings and filter
contains 100,000 strings. It appears that the operation takes too much time so I am not doing it efficiently.
Is there a faster way?
Upvotes: 1
Views: 85
Reputation: 5533
Just use different collection types. For the Filter, use HashSet
for O(1)
(instad of O(n)
for ArrayList
) search complexity, and for the items, use LinkedList
instead of ArrayList
- which will be more efficient for the remove operations.
I didn't test this code, but...
String[] sarr = fdata.split(System.getProperty("line.separator"));
LinkedList<String> items = new LinkedList(Arrays.asList(sarr));
Set<String> filter = new HashSet<String>();
filter = new HashSet(Arrays.asList(voc.split(System.getProperty("line.separator"))));
items.retainAll(filter);
Upvotes: 6
Reputation: 41635
When you call collection.contains(element)
often for a large collection, you should not use an ArrayList
, but rather a HashSet
.
Set<String> filter = new HashSet<>();
Collections.addAll(filter, voc.split(System.getProperty("line.separator")));
A HashSet
is an optimized data structure for looking up things.
Upvotes: 3