hellzone
hellzone

Reputation: 5236

How to check multiple contains operations faster?

I have a String list as below. I want to do some calculations based on if this list has multiple elements with same value.

I got nearly 120k elements and when I run this code it runs too slow. Is there any faster approach than contains method?

List<String> words= getWordsFromDB();  //words list has nearly 120k elements
List<String> tempWordsList = new LinkedList<String>();  //empty list

String[] keys = getKeysFromDB();
List<String> tempKeysList = new LinkedList<String>();

for (int x = 0; x < words.size(); x++) {

    if (!tempWordsList.contains(words.get(x))) {
      tempWordsList.add(words.get(x));

      String key= keys[x];
      tempKeysList.add(key);
    } else {
        int index = tempWordsList.indexOf(words.get(x));

        String m = tempKeysList.get(index);
        String n = keys[x];
        if (!m.contains(n)) {
            String newWord = m + ", " + n;
            tempKeysList.set(index, newWord);
        }
    }
}

EDIT: words list comes from database and problem is there is a service continuously updating and inserting data to this table. I don't have any access to this service and there are other applications who is using the same table.

EDIT2: I have updated for full code.

Upvotes: 1

Views: 89

Answers (4)

shmosel
shmosel

Reputation: 50716

LinkedList.get() runs in O(N) time. Either use ArrayList with O(1) lookup time, or avoid indexed lookups altogether by using an iterator:

for (String word : words) {
    if (!tempList.contains(word)) {
        tempList.add(word);
    } else {
        int firstIndex = tempList.indexOf(word);
        //do some calculations
    }
}

Disclaimer: The above was written under the questionable assumption that words is a LinkedList. I would still recommend the enhanced-for loop, since it's more conventional and its time complexity is not implementation-dependent. Either way, the suggestion below still stands.


You can further improve by replacing tempList with a HashMap. This will avoid the O(N) cost of contains() and indexOf():

Map<String, Integer> indexes = new HashMap<>();
int index = 0;
for (String word : words) {
    Integer firstIndex = indexes.putIfAbsent(word, index++);
    if (firstIndex != null) {
        //do some calculations
    }
}

Based on your latest update, it looks like you're trying to group "keys" by their corresponding "word". If so, you might give streams a spin:

List<String> words = getWordsFromDB();
String[] keys = getKeysFromDB();
Collection<String> groupedKeys = IntStream.range(0, words.size())
        .boxed()
        .collect(Collectors.groupingBy(
                words::get,
                LinkedHashMap::new,   // if word order is significant
                Collectors.mapping(
                        i -> keys[i],
                        Collectors.joining(", "))))
        .values();

However, as mentioned in the comments, it would probably be best to move this logic into your database query.

Upvotes: 2

DhaRmvEEr siNgh
DhaRmvEEr siNgh

Reputation: 2098

//1)  if you can avoid using linked list use below solution

List<String> words= getWordsFromDB();  //words list has nearly 120k elements
    //if you can avoid using linked list, use set instead
    Set<String> set=new HashSet<>();
    for (String s:words) {
        if (!set.add(s)) {
          //do some calculations
        }
    }


   //2)  if you can't avoid using linked list use below code
    List<String> words= getWordsFromDB();  //words list has nearly 120k elements
    List<String> tempList = new LinkedList<String>();  //empty list
    //if you can't avoid LinkedListv (tempList) you need to use a set 
    Set<String> set=new HashSet<>();
    for (String s:words) {

        if (set.add(s)) {
          tempList.add(s);
        } else {
          int a = tempList.indexOf(s);
          //do some calculations
        }
    }

Upvotes: 2

davidxxx
davidxxx

Reputation: 131346

Acutally, tempList use linear complexity time methods :

if (!tempList.contains(words.get(x))) {

and

int a = tempList.indexOf(words.get(x));

It means that at each invocation of them, the list is in average iterate at half.
Besides, these are redundant.
indexOf() only could be invoked :

for (int x = 0; x < words.size(); x++) {

    int indexWord = tempList.indexOf(words.get(x));

    if (indexWord != -1) {
      tempList.add(words.get(x));
    } else {        
      //do some calculations by using indexWord
    }
}

But to improve all accesses, you should change your structure : wrapping or replacing LinkedList by LinkedHashSet.
LinkedHashSet would keep the actual behavior because as List, it defines the iteration ordering, which is the order in which elements were inserted into the set but it also uses hashing feature to improve time access to its elements.

Upvotes: 0

user207421
user207421

Reputation: 310884

You are searching the list twice per word: once for contains() and once for indexOf(). You could replace contains() by indexOf(), test the result for -1, otherwise reuse the result instead of calling indexOf() again. But you are certainly using the wrong data structure. What exactly do you need a for? Do you need a? I would use a HashSet, or a HashMap if you need to associate other data with each word.

Upvotes: 2

Related Questions