baxx
baxx

Reputation: 4705

Check if list element is a sub-element of those in another list with Python

Given the following data:

a = ["onee", "two", "three"]
b = ["one", "four"]

I would like for some test such as the following:

[True if x in a else False for x in b]

To return

[True, False]

Instead of

[False, False]

So for each element in list b, I want to see whether it's a substring of any of the elements in list a.

One way this can be done is as following:

test = []
for elb in b:
    included = False
    for ela in a:
        if elb in ela:
            included = True
        break
    test.append(included)

I don't feel that this is a very nice approach though, perhaps there's a comprehension that can improve on it?

The following also works:

[True if any(elb in ela for ela in a) else False for elb in b]

I'm just thinking that there's likely to be a nicer approach.

Upvotes: 2

Views: 875

Answers (4)

Joe Iddon
Joe Iddon

Reputation: 20414

First of all, this

True if True else False

is redundant. So in your first comp. you can just have: [x in a for x in b], similarly, [any(elb in ela for ela in a) for elb in b].

And I think this is as short, in terms of characters, that you are going to get it.

Efficiency wise, however, you could pre-generate all possible sub-strings from all the strings in a, storing these in a set.

This would mean that the complexity would be reduced from O(n*m*p), where n is the length of b, m is the length of a, and n is the average sub-string length of a, to simply O(n). This is because, once the sub-string lookup set has been created, checking a particular element in b is an O(1) operation since you are checking for inclusion in a set, rather than O(m*p) where you have to check every sub-string of every element in a.

To generate this sub-string lookup set you could use a set comprehension:

a_substrings = {s[i:j] for s in a for i in range(len(s)) for j in range(i+1, len(s)+1)}

then you can just check in this:

[s in a_substrings for s in b]

which gives the expected [True, False] for your inputs.


Is this really faster?

For small sized a and b lists, the overhead of creating the lookup set would outweigh the advantages of being able to check each element in b. Furthermore, for an extortionately long a list, containing long strings, and even a moderately sized b, it may again be slower to take the time going through all the sub-strings of a and creating the lookup set, especially if the majority of elements in b are going to match within the first few strings of a.

However, in cases where both lists are long, most importantly when b is long, your method would be continuously generating and checking the same elements of a over and over again for each element of b. Clearly this is slower than pre-calculating the sub-sets. I guess this is essentially a key optimisation of search engines - when someone presents a query, they don't start trawling website from a blank slate each time, instead they are continuously re-evaluating all known websites, of course in order of popularity, so that they are "ready to go" when a query comes in.

Upvotes: 3

revliscano
revliscano

Reputation: 2272

This is another approach I came over with:

[x in "-".join(y for y in a) for x in b]

Joining all the strings of a in one string and testing if the element is in there.

Outputs:

[True, False]

Disclaimer: Not sure if this is precisely "nicer" but well, again, just another approach.

Upvotes: 2

Miro
Miro

Reputation: 112

You can do:

>>> [ y.startswith(x) for x, y in zip(b,a)]
[True, False]
>>> 

Upvotes: 0

Błotosmętek
Błotosmętek

Reputation: 12927

This is enough:

[ any(elb in ela for ela in a) for elb in b ]

Upvotes: 0

Related Questions