user15110545
user15110545

Reputation: 107

Determine if List of Strings contains substring of all Strings in other list

I have this situation:

val a = listOf("wwfooww", "qqbarooo", "ttbazi")
val b = listOf("foo", "bar")

I want to determine if all items of b are contained in substrings of a, so the desired function should return true in the situation above. The best I can come up with is this:

return a.any { it.contains("foo") } && a.any { it.contains("bar") }

But it iterates over a twice. a.containsAll(b) doesn't work either because it compares on string equality and not substrings.

Upvotes: 1

Views: 1715

Answers (2)

gidds
gidds

Reputation: 18577

Alex's answer is by far the simplest approach, and is almost certainly the best one in most circumstances.

However, it has complexity A*B (where A and B are the sizes of the two lists) — which means that it doesn't scale: if both lists get big, it'll get very slow.

So for completeness, here's a way that's more involved, and slower for the small cases, but has complexity proportional to A+B and so can cope efficiently with much larger lists.

The idea is to preprocess the a list, to generate a set of all the possible substrings, and then scan through the b list just checking for inclusion in that set.  (The preprocessing step takes time proportional* to A.  Converting the substrings into a set means that it can check whether a string is present in constant time, using its hash code; so the rest then takes time proportional to B.)

I think this is clearest using a helper function:

/**
 * Generates a list of all possible substrings, including
 * the string itself (but excluding the empty string).
 */
fun String.substrings()
    = indices.flatMap { start ->
        ((start + 1)..length).map { end ->
            substring(start, end)
        }
    }

For example, "1234".substrings() gives [1, 12, 123, 1234, 2, 23, 234, 3, 34, 4].

Then we can generate the set of all substrings of items from a, and check that every item of b is in it:

return a.flatMap{ it.substrings() }
        .toSet()
        .containsAll(b)

(* Actually, the complexity is also affected by the lengths of the strings in the a list.  Alex's version is directly proportional to the average length, while the preprocessing part of the algorithm above is proportional to its square (as indicated by the map nested in the flatMap).  That's not good, of course; but in practice while the lists are likely to get longer, the strings within them probably won't, so that's unlikely to be significant.  Worth knowing about, though.

And there are probably other, still more complex algorithms, that scale even better…)

Upvotes: 1

AlexT
AlexT

Reputation: 2964

Not sure if there is any way of doing that without iterating over a the same amount as b.size. Because if you only want 1 iteration of a, you will have to check all the elements on b and now you are iterating over b a.size times plus, in this scenario, you also need to keep track of which item in b already had a match, and not check them again, which might be worse than just iterating over a, since you can only do that by either removing them from the list (or a copy, which you use instead of b), or by using another list to keep track of the matches, then compare that to the original b.

So I think that you are on the right track with your code there, but there are some issues. For example you don't have any reference to b, just hardcoded strings, and doing it like that for all elements in b will result in quite a big function if you have more than 2, or better yet, if you don't already know the values.

This code will do the same thing as the one you put above, but it will actually use elements from b, and not hardcoded strings that match b. (it will iterate over b b.size times, and partially over a b.size times)

return b.all { bItem ->
        a.any { it.contains(bItem) }
    }

Upvotes: 5

Related Questions