Reputation: 4705
Given the following data:
a = ["onee", "two", "three"]
b = ["one", "four"]
I would like for some test such as the following:
[True if x in a else False for x in b]
To return
[True, False]
Instead of
[False, False]
So for each element in list b
, I want to see whether it's a substring of any of the elements in list a
.
One way this can be done is as following:
test = []
for elb in b:
included = False
for ela in a:
if elb in ela:
included = True
break
test.append(included)
I don't feel that this is a very nice approach though, perhaps there's a comprehension that can improve on it?
The following also works:
[True if any(elb in ela for ela in a) else False for elb in b]
I'm just thinking that there's likely to be a nicer approach.
Upvotes: 2
Views: 875
Reputation: 20414
First of all, this
True if True else False
is redundant. So in your first comp. you can just have: [x in a for x in b]
, similarly, [any(elb in ela for ela in a) for elb in b]
.
And I think this is as short, in terms of characters, that you are going to get it.
Efficiency wise, however, you could pre-generate all possible sub-strings from all the strings in a
, storing these in a set
.
This would mean that the complexity would be reduced from O(n*m*p)
, where n
is the length of b
, m
is the length of a
, and n
is the average sub-string length of a
, to simply O(n)
. This is because, once the sub-string lookup set has been created, checking a particular element in b
is an O(1)
operation since you are checking for inclusion in a set, rather than O(m*p)
where you have to check every sub-string of every element in a
.
To generate this sub-string lookup set you could use a set comprehension:
a_substrings = {s[i:j] for s in a for i in range(len(s)) for j in range(i+1, len(s)+1)}
then you can just check in
this:
[s in a_substrings for s in b]
which gives the expected [True, False]
for your inputs.
For small sized a
and b
lists, the overhead of creating the lookup set would outweigh the advantages of being able to check each element in b
. Furthermore, for an extortionately long a
list, containing long strings
, and even a moderately sized b
, it may again be slower to take the time going through all the sub-strings of a
and creating the lookup set, especially if the majority of elements in b
are going to match within the first few strings of a
.
However, in cases where both lists are long, most importantly when b
is long, your method would be continuously generating and checking the same elements of a
over and over again for each element of b
. Clearly this is slower than pre-calculating the sub-sets. I guess this is essentially a key optimisation of search engines - when someone presents a query, they don't start trawling website from a blank slate each time, instead they are continuously re-evaluating all known websites, of course in order of popularity, so that they are "ready to go" when a query comes in.
Upvotes: 3
Reputation: 2272
This is another approach I came over with:
[x in "-".join(y for y in a) for x in b]
Joining all the strings of a
in one string and testing if the element is in there.
Outputs:
[True, False]
Disclaimer: Not sure if this is precisely "nicer" but well, again, just another approach.
Upvotes: 2
Reputation: 112
You can do:
>>> [ y.startswith(x) for x, y in zip(b,a)]
[True, False]
>>>
Upvotes: 0
Reputation: 12927
This is enough:
[ any(elb in ela for ela in a) for elb in b ]
Upvotes: 0