Reputation: 11
I have 2 strings:
1) John has 2 apples.
2) Cody plays xbox in John's basement.
Now these 2 strings have "John
" in common
But there seems to be no programmatic way to check this. The closest I could get is how to check if a string contains a specific word: str1.toLowerCase().contains(str2.toLowerCase())
So how do I check If String 1) contains part of String 2)?
Upvotes: 1
Views: 1133
Reputation: 307
Based on https://stackoverflow.com/a/4448435/3790546:
private static List<String> interection(String s1, String s2) {
HashSet<String> h1, h2;
h1 = toHashSet(s1);
h2 = toHashSet(s2);
h1.retainAll(h2);
return h1.stream().collect(toList());
}
private static HashSet<String> toHashSet(String s) {
return Arrays.stream(s.split("[\\s@&.'?$+-]+")).collect(Collectors.toCollection(HashSet::new));
}
public static void main (String [] args) {
interection("John has 2 apples.", "Cody plays xbox in John's basement.").forEach(System.out::println);
}
Upvotes: 0
Reputation: 739
This could work
public static void main(String[] args) {
String x = "John has 2 apples.";
String y = "Cody plays xbox in John's basement.";
// print words of x that matches any of y
findMatch(Arrays.asList(x.split(" ")), y);
// print words of y that matches any of x
findMatch(Arrays.asList(y.split(" ")), x);
}
private static void findMatch(List<String> firstArr,String statement) {
for (String string : firstArr) {
if(statement.contains(string)) {
System.out.println(string);
}
}
}
Upvotes: 1
Reputation: 6248
If you're talking about two strings, you would have to create a matrix of each with respect to eachother, i.e.
1) "abcd
"
2) "cdef
"
Matrix)
ac bc cc dc
ad bd cd dd
ae be ce de
af bc cf df
Then check for doubles to show up in diagnal patterns, (i.e., cc
and dd
) in the above example.
This means, you will have to iterate through each string for each char of the other string, so I believe this would give you O(n^2) time complexity. For each diagonal match up, that would be a token that matches (and there could be more than one). This is not the same thing as longest common substring, as @lexicore said.
If they are really large strings, you probably wouldn't want to go through each char, but instead tokenize them (i.e. by splitting them by spaces) and creating a sorted list for each (or hash table or something) so you could itterate through each one in O(log n)ish time. I think that would give you something like O((log n)^2), but at least better than O(n^2).
Upvotes: 0