Reputation: 11

java- removing substring in a list of strings

Consider the case of a list of strings example : list=['apple','bat','cow,'dog','applebat','cowbat','dogbark','help']

The java code must check if any element of string is a subset of another element and if it is then larger string element must be removed.

so in this case strings 'applebat','cowbat','dogbark, are removed.

The approach I have taken was to take two lists and iterate over them in the following way,

ArrayList<String> list1 = new ArrayList<String>(strings);
ArrayList<String> list2 = new ArrayList<String>(strings);
for(int i = 0; i<list1.size();i++)
    {
        String curr1 = list1.get(i);

        for(int j = 0;j<list2.size();j++)
        {
            String curr2 = list2.get(j);

            if(curr2.contains(curr1)&&!curr2.equals(curr1))
            {
                list2.remove(j);
                j--;
        }
        }
    }

IMPORTANT I have lists with the sizes of 200K to 400K elements.I would like to find a way to improve performance. I even tried hashsets but they were not much help.I am facing issues with the time taken by the program.

Can any one suggest any improvements to my code or any other approaches in java to improve performance??

Upvotes: 1

Answers (5)

Iamat8

Reputation: 3906

I have tested it on small data and hope it helps you to find solution...

import java.util.ArrayList;
import java.util.Arrays;

public class Main {
    public static void main(String[] args){
        String []list = {"apple","bat","cow","dog","applebat","cowbat","dogbark","help","helpless","cows"};
        System.out.println(Arrays.toString(list));
        int prelenght = 0;
        int prolenght = 0;
        long pretime = System.nanoTime();
        for(int i=0;i<list.length;i++){
            String x = list[i];
            prelenght = list[i].length();
            for(int j=i+1;j<list.length;j++){               
                String y = list[j];
                if(y.equals(x)){
                    list[j] = "0";
                }else if(y.contains(x)||x.contains(y)){
                    prolenght = list[j].length();                   
                    if(prelenght<prolenght){
                        list[j] = "0";
                    }                       
                    if(prelenght>prolenght){
                        list[i] = "0";
                        break;
                    }
                }
            }
        }       
        long protime = System.nanoTime();
        long time = (protime - pretime);
        System.out.println(time + "ns");
        UpdateArray(list);      
    }

    public static void UpdateArray(String[] list){
        ArrayList<String> arrayList = new ArrayList<>();
        for(int i=0;i<list.length;i++){
            if(!list[i].equals("0")){
                arrayList.add(list[i]);
            }
        }
        System.out.println(arrayList.toString());
    }
}

Output :

[apple, bat, cow, dog, applebat, cowbat, dogbark, help, helpless, cows]
time elapsed : 47393ns
[apple, bat, cow, dog, help]

Upvotes: 0

聂俊豪

Reputation: 1

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Date;
import java.util.List;
import java.util.Random;

public class SubStrRmove {
    public static List<String> randomList(int size) {
        final String BASE = "abcdefghijklmnopqrstuvwxyz";
        Random random = new Random();
        List<String> list = new ArrayList<>();
        for (int i = 0; i < size; i++) {
            int length = random.nextInt(3) + 2;
            StringBuffer sb = new StringBuffer();
            for (int j = 0; j < length; j++) {
                int number = random.nextInt(BASE.length());
                sb.append(BASE.charAt(number));
            }
            list.add(sb.toString());
            sb.delete(0, sb.length());
        }
        return list;
    }

    public static List<String> removeListSubStr(List<String> args) {
        String[] input = args.toArray(new String[args.size()]);
        Arrays.parallelSort(input, (s1, s2) -> s1.length() - s2.length());
        List<String> result = new ArrayList<>(args.size());
        for (int i = 0; i < input.length; i++) {
            String temp = input[i];
            if (!result.stream().filter(s -> temp.indexOf(s) >= 0).findFirst().isPresent()) {
                result.add(input[i]);
            }
        }
        return result;
    }

    public static List<String> removeListSubStr2(List<String> args) {
        String[] input = args.toArray(new String[args.size()]);
        Arrays.parallelSort(input, (s1, s2) -> s1.length() - s2.length());
        List<String> result = new ArrayList<>(args.size());
        for (int i = 0; i < input.length; i++) {
            boolean isDiff = true;
            for (int j = 0; j < result.size(); j++) {
                if (input[i].indexOf(result.get(j)) >= 0) {
                    isDiff = false;
                    break;
                }
            }
            if (isDiff) {
                result.add(input[i]);
            }
        }
        return result;
    }

    public static void main(String[] args) throws InterruptedException {
        List<String> list = randomList(20000);
        Long start1 = new Date().getTime();
        List<String> listLambda = removeListSubStr(list);
        Long end1 = new Date().getTime();
        Long start2 = new Date().getTime();
        List<String> listFor = removeListSubStr2(list);
        Long end2 = new Date().getTime();
        System.out.println("mothod Labbda:" + (end1 - start1) + "ms");
        System.out.println("mothod simple:" + (end2 - start2) + "ms");
        System.out.println("" + listLambda.size() + listLambda);
        System.out.println("" + listFor.size() + listFor);

    }

}

Upvotes: 0

Ramcharan Reddy

Reputation: 689

import java.util.ArrayList;
import java.util.*;
// our main class becomes a file but the main method is still found
public class HelloWorld
{
  public static void main(String[] args)
  {
    String[] strings = {"apple","bat","cow","dog","applebat","cowbat","dogbark","help"};
    ArrayList<String> list1 = new ArrayList<String>(Arrays.asList(strings));
ArrayList<String> list2 = new ArrayList<String>(Arrays.asList(strings));
ArrayList<String> result = new ArrayList<String>(Arrays.asList(strings));
for(int i = 0; i<8;i++)
{

    String curr1 = list1.get(i);
    System.out.println(curr1);
    int flag = 0;
    for(int j = i+1;j<8;j++)
    {
        String curr2 = list2.get(j);

        if((curr2.contains(curr1)&&!curr2.equals(curr1)))
        {

            result.remove(curr2);
        }
    }

}
 System.out.println(result);

  }
}

Upvotes: 2

Sergey Frolov

Reputation: 1377

I suppose set will be faster here. You can easy do that with java8 stream api.

Try that:

private Set<String> delete() {
        Set<String> startSet = new HashSet<>(Arrays.asList("a", "b", "c", "d", "ab", "bc", "ce", "fg"));
        Set<String> helperSet = new HashSet<>(startSet);

        helperSet.forEach(s1 -> helperSet.forEach(s2 -> {
            if (s2.contains(s1) && !s1.equals(s2)) {
                startSet.remove(s2);
            }
        }));

        return startSet;
    }

Do not delete any elements from set you are iterating for or you will have ConcurrentModificationException.

Upvotes: 0

Andreas

Reputation: 159096

For full performance boost of huge list of words, I would think a combination of sort and a string searching algorithm, such as the Aho–Corasick algorithm, is what you need, assuming you're willing to implement such complex logic.

First, sort the words by length.

Then build up the Aho–Corasick Dictionary, in word length order. For each word, first check if a substring exists in the dictionary. If it does, skip the word, otherwise add the word to the dictionary.

When done, dump the dictionary, or the parallel-maintained list if dictionary is not easy/possible to dump.

Upvotes: 0

java- removing substring in a list of strings

Answers (5)

Related Questions