Reputation: 469
I'm trying to take two sentences and see if they have words in common. Example:
A- "Hello world this is a test"
B- "Test to create things"
The common word here is "test"
I tried using .contains()
but it doesn't work because I can only search for one word.
text1.toLowerCase ().contains(sentence1.toLowerCase ())
Upvotes: 1
Views: 1995
Reputation:
Try this.
static boolean contains(String text1, String text2) {
String text1LowerCase = text1.toLowerCase();
return Arrays.stream(text2.toLowerCase().split("\\s+"))
.anyMatch(word -> text1LowerCase.contains(word));
}
and
String text1 = "Hello world this is a test";
String text2 = "Test to create things";
System.out.println(contains(text1, text2));
output:
true
Upvotes: 1
Reputation: 3829
Here's one approach:
// extract the words from the sentences by splitting on white space
String[] sentence1Words = sentence1.toLowerCase().split("\\s+");
String[] sentence2Words = sentence2.toLowerCase().split("\\s+");
// make sets from the two word arrays
Set<String> sentence1WordSet = new HashSet<String>(Arrays.asList(sentence1Words));
Set<String> sentence2WordSet = new HashSet<String>(Arrays.asList(sentence2Words));
// get the intersection of the two word sets
Set<String> commonWords = new HashSet<String>(sentence1WordSet);
commonWords.retainAll(sentence2WordSet);
This will yield a Set containing lower case versions of the common words between the two sentences. If it is empty there is no similarity. If you don't care about some words like prepositions you can filter those out of the final similarity set or, better yet, preprocess your sentences to remove those words first.
Note that a real-world (ie. useful) implementation of similarity checking is usually far more complex, as you usually want to check for words that are similar but with minor discrepancies. Some useful starting points to look into for these type of string similarity checking are Levenshtein distance and metaphones.
Note there is a redundant copy of the Set in the code above where I create the commonWords set because intersection is performed in-place, so you could improve performance by simply performing the intersection on sentence1WordSet, but I have favoured code clarity over performance.
Upvotes: 0
Reputation: 18430
You can split the sentence by space and collect the word as list and then search one list item in another list and collect the common words.
Here an example using Java Stream API. Here first sentence words collect as Set to faster the search operation for every words (O(1)
)
String a = "Hello world this is a test";
String b = "Test to create things";
Set<String> aWords = Arrays.stream(a.toLowerCase().split(" "))
.collect(Collectors.toSet());
List<String> commonWords = Arrays.stream(b.toLowerCase().split(" "))
.filter(bw -> aWords.contains(bw))
.collect(Collectors.toList());
System.out.println(commonWords);
Output: test
Upvotes: 0
Reputation: 89234
You can create HashSet
s from both of the words after splitting on whitespace. You can use Set#retainAll
to find the intersection (common words).
final String a = "Hello world this is a test", b = "Test to create things";
final Set<String> words = new HashSet<>(Arrays.asList(a.toLowerCase().split("\\s+")));
final Set<String> words2 = new HashSet<>(Arrays.asList(b.toLowerCase().split("\\s+")));
words.retainAll(words2);
System.out.println(words); //[test]
Upvotes: 2
Reputation: 123
Spilt the two sentences by space and add each word from first string in a Set. Now in a loop, try adding words from second string in the set. If add operation returns false then it is a common word.
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
public class Sample {
public static void main(String[] args) {
// TODO Auto-generated method stub
String str1 = "Hello world this is a test";
String str2 = "Test to create things";
str1 = str1.toLowerCase();
str2 = str2.toLowerCase();
String[] str1words = str1.split(" ");
String[] str2words = str2.split(" ");
boolean flag = true;
Set<String> set = new HashSet<String>(Arrays.asList(str1words));
for(int i = 0;i<str2words.length;i++) {
flag = set.add(str2words[i]);
if(flag == false)
System.out.println(str2words[i]+" is common word");
}
}
}
Upvotes: 0