user9598926
user9598926

Reputation: 11

How to check If a string contains a part of another string? (not the whole string)

I have 2 strings:

1) John has 2 apples.

2) Cody plays xbox in John's basement.

Now these 2 strings have "John" in common

But there seems to be no programmatic way to check this. The closest I could get is how to check if a string contains a specific word: str1.toLowerCase().contains(str2.toLowerCase())

So how do I check If String 1) contains part of String 2)?

Upvotes: 1

Views: 1133

Answers (3)

Pablo
Pablo

Reputation: 307

Based on https://stackoverflow.com/a/4448435/3790546:

private static List<String> interection(String s1, String s2) {
    HashSet<String> h1, h2;
    h1 = toHashSet(s1);
    h2 = toHashSet(s2);
    h1.retainAll(h2);
    return h1.stream().collect(toList());
}

private static HashSet<String> toHashSet(String s) {
    return Arrays.stream(s.split("[\\s@&.'?$+-]+")).collect(Collectors.toCollection(HashSet::new));
}

public static void main (String [] args) {
    interection("John has 2 apples.", "Cody plays xbox in John's basement.").forEach(System.out::println);
}

Upvotes: 0

Mohammad
Mohammad

Reputation: 739

This could work

public static void main(String[] args) {
    String x = "John has 2 apples.";
    String y = "Cody plays xbox in John's basement.";
    // print words of x that matches any of y
    findMatch(Arrays.asList(x.split(" ")), y);
    // print words of y that matches any of x
    findMatch(Arrays.asList(y.split(" ")), x);

}

private static void findMatch(List<String> firstArr,String statement) {
    for (String string : firstArr) {
        if(statement.contains(string)) {
            System.out.println(string);
        }
    }
}

Upvotes: 1

Alexander Kleinhans
Alexander Kleinhans

Reputation: 6248

If you're talking about two strings, you would have to create a matrix of each with respect to eachother, i.e.

1) "abcd"

2) "cdef"

Matrix)

ac bc cc dc
ad bd cd dd
ae be ce de
af bc cf df

Then check for doubles to show up in diagnal patterns, (i.e., cc and dd ) in the above example.

This means, you will have to iterate through each string for each char of the other string, so I believe this would give you O(n^2) time complexity. For each diagonal match up, that would be a token that matches (and there could be more than one). This is not the same thing as longest common substring, as @lexicore said.

If they are really large strings, you probably wouldn't want to go through each char, but instead tokenize them (i.e. by splitting them by spaces) and creating a sorted list for each (or hash table or something) so you could itterate through each one in O(log n)ish time. I think that would give you something like O((log n)^2), but at least better than O(n^2).

Upvotes: 0

Related Questions