RoktSe
RoktSe

Reputation: 469

Compare two sentences and check if they have a similar word

I'm trying to take two sentences and see if they have words in common. Example:
A- "Hello world this is a test"
B- "Test to create things"

The common word here is "test"

I tried using .contains() but it doesn't work because I can only search for one word.

text1.toLowerCase ().contains(sentence1.toLowerCase ())

Upvotes: 1

Views: 1995

Answers (5)

user4910279
user4910279

Reputation:

Try this.

static boolean contains(String text1, String text2) {
    String text1LowerCase = text1.toLowerCase();
    return Arrays.stream(text2.toLowerCase().split("\\s+"))
        .anyMatch(word -> text1LowerCase.contains(word));
}

and

String text1 = "Hello world this is a test";
String text2 = "Test to create things";
System.out.println(contains(text1, text2));

output:

true

Upvotes: 1

Matt Coubrough
Matt Coubrough

Reputation: 3829

Here's one approach:

    // extract the words from the sentences by splitting on white space
    String[] sentence1Words = sentence1.toLowerCase().split("\\s+");
    String[] sentence2Words = sentence2.toLowerCase().split("\\s+");
        
    // make sets from the two word arrays
    Set<String> sentence1WordSet = new HashSet<String>(Arrays.asList(sentence1Words));
    Set<String> sentence2WordSet = new HashSet<String>(Arrays.asList(sentence2Words));
        
    // get the intersection of the two word sets
    Set<String> commonWords = new HashSet<String>(sentence1WordSet); 
    commonWords.retainAll(sentence2WordSet);        

This will yield a Set containing lower case versions of the common words between the two sentences. If it is empty there is no similarity. If you don't care about some words like prepositions you can filter those out of the final similarity set or, better yet, preprocess your sentences to remove those words first.

Note that a real-world (ie. useful) implementation of similarity checking is usually far more complex, as you usually want to check for words that are similar but with minor discrepancies. Some useful starting points to look into for these type of string similarity checking are Levenshtein distance and metaphones.

Note there is a redundant copy of the Set in the code above where I create the commonWords set because intersection is performed in-place, so you could improve performance by simply performing the intersection on sentence1WordSet, but I have favoured code clarity over performance.

Upvotes: 0

Eklavya
Eklavya

Reputation: 18430

You can split the sentence by space and collect the word as list and then search one list item in another list and collect the common words.

Here an example using Java Stream API. Here first sentence words collect as Set to faster the search operation for every words (O(1))

String a = "Hello world this is a test";
String b = "Test to create things";
Set<String> aWords = Arrays.stream(a.toLowerCase().split(" "))
                            .collect(Collectors.toSet());
List<String> commonWords = Arrays.stream(b.toLowerCase().split(" "))
                                 .filter(bw -> aWords.contains(bw))
                                 .collect(Collectors.toList());
System.out.println(commonWords);

Output: test

Upvotes: 0

Unmitigated
Unmitigated

Reputation: 89234

You can create HashSets from both of the words after splitting on whitespace. You can use Set#retainAll to find the intersection (common words).

final String a = "Hello world this is a test", b = "Test to create things";
final Set<String> words = new HashSet<>(Arrays.asList(a.toLowerCase().split("\\s+")));
final Set<String> words2 = new HashSet<>(Arrays.asList(b.toLowerCase().split("\\s+")));
words.retainAll(words2);
System.out.println(words); //[test]

Upvotes: 2

Deepak
Deepak

Reputation: 123

Spilt the two sentences by space and add each word from first string in a Set. Now in a loop, try adding words from second string in the set. If add operation returns false then it is a common word.

import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;

public class Sample {

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        String str1 = "Hello world this is a test";
        String str2 = "Test to create things";
        str1 = str1.toLowerCase();
        str2 = str2.toLowerCase();
        String[] str1words = str1.split(" ");
        String[] str2words = str2.split(" ");
        boolean flag = true;
        Set<String> set = new HashSet<String>(Arrays.asList(str1words));
        for(int i = 0;i<str2words.length;i++) {
            flag = set.add(str2words[i]);
            if(flag == false)
                System.out.println(str2words[i]+" is common word");
        }
    }

}

Upvotes: 0

Related Questions