vnkotak
vnkotak

Reputation: 127

Java - Search keywords list in another string list

I have a list of keywords in a List and I have data coming from some source which will be a list too.

I would like to find if any of keywords exists in the data list, if yes add those keywords to another target list.

E.g.

Keywords list = FIRSTNAME, LASTNAME, CURRENCY & FUND

Data list = HUSBANDFIRSTNAME, HUSBANDLASTNAME, WIFEFIRSTNAME, SOURCECURRENCY & CURRENCYRATE.

From above example, I would like to make a target list with keywords FIRSTNAME, LASTNAME & CURRENCY, however FUND should not come as it doesn't exists in the data list.

I have a solution below that works by using two for loops (one inside another) and check with String contains method, but I would like to avoid two loops, especially one inside another.

  for (int i=0; i<dataList.size();i++) {
      for (int j=0; j<keywordsList.size();j++) {
            if (dataList.get(i).contains(keywordsList.get(j))) {
                  targetSet.add(keywordsList.get(j));
                  break;
            }
      }
    }

Is there any other alternate solution for my problem?

Upvotes: 4

Views: 2685

Answers (2)

Shar1er80
Shar1er80

Reputation: 9041

Here's a one loop approach using regex. You construct a pattern using your keywords, and then iterate through your dataList and see if you can find a match.

public static void main(String[] args) throws Exception {
    List<String> keywords = new ArrayList(Arrays.asList("FIRSTNAME", "LASTNAME", "CURRENCY", "FUND"));
    List<String> dataList = new ArrayList(Arrays.asList("HUSBANDFIRSTNAME", "HUSBANDLASTNAME", "WIFEFIRSTNAME", "SOURCECURRENCY", "CURRENCYRATE"));
    Set<String> targetSet = new HashSet();

    String pattern = String.join("|", keywords);
    for (String data : dataList) {
        Matcher matcher = Pattern.compile(pattern).matcher(data);
        if (matcher.find()) {
            targetSet.add(matcher.group());
        }
    }

    System.out.println(targetSet);
}

Results:

[CURRENCY, LASTNAME, FIRSTNAME]

Upvotes: 1

Sayalic
Sayalic

Reputation: 7650

Try Aho–Corasick algorithm. This algorithm can get the count of appearance of every keyword in the data (You just need whether it appeared or not).

The Complexity is O(Sum(Length(Keyword)) + Length(Data) + Count(number of match)).

Here is the wiki-page:

In computer science, the Aho–Corasick algorithm is a string searching algorithm invented by Alfred V. Aho and Margaret J. Corasick. It is a kind of dictionary-matching algorithm that locates elements of a finite set of strings (the "dictionary") within an input text. It matches all patterns simultaneously. The complexity of the algorithm is linear in the length of the patterns plus the length of the searched text plus the number of output matches.

I implemented it(about 200 lines) years ago for similar case, and it works well.

If you just care keyword appeared or not, you can modify that algorithm for your case with a better complexity:

O(Sum(Length(Keyword)) + Length(Data)).

You can find implementation of that algorithm from internet everywhere but I think it's good for you to understand that algorithm and implement it by yourself.


EDIT:

I think you want to eliminate two-loops, so we need find all keywords in one loop. We call it Set Match Problem that a set of patterns(keywords) to match a text(data). You want to solve Set Match Problem, then you should choose Aho–Corasick algorithm which is particularly designed for that case. In that way, we will get one loop solution:

for (int i=0; i < dataList.size(); i++) {       
  targetSet.addAll(Ac.run(keywordsList));
}

You can find a implementation from here.

Upvotes: 1

Related Questions