Reputation: 773
Let's say you have a company's information like this:
companies = [
['zmpEVqsbCUO1aXStxHkSVA', 'palms-car-wash'],
['5T0vKfIJWP1xTnxA7fJ17w', 'meat-and-bread'],
['C0d5kzUx6C19mLcxQyhxCA', 'alamo-drafthouse-cinema-'],
['ch1ercqwoNLpQLxpTb90KQ', 'boston-tea-stop']
]
Let's say you want to exclude some business if any string/substring of a list is present in some information of the list above:
no_interest = ['museum', 'cinema', 'car']
I have done this, (we only look in the 2nd column of every entry):
# KEEPING ONLY RESULTS WHERE WE DO NOT FIND THE SUBSTRINGS
[x for x in companies if (no_interest[0] not in x[1]) & (no_interest[1] not in x[1]) & (no_interest[2] not in x[1])]
# Returns
[['5T0vKfIJWP1xTnxA7fJ17w', 'meat-and-bread'],
['ch1ercqwoNLpQLxpTb90KQ', 'boston-tea-stop']]
It seems to work even if I would prefer it to work with an 'OR' statement instead of an 'AND' (&) which for me is a cumulative operator and should be working if ALL the conditions are met ('museum'
, 'cinema'
and 'car'
in the same string)
Why is the 'AND' statement acting like an 'OR'? How can we make this code more pythonic and more efficient?
We only check for 3 substrings here but it is more and more about thousands of occurrences we are looking for and it will be great to not repeat those conditions but have something more like an all()
or any()
statement that returns results and not a boolean.
Upvotes: 1
Views: 2107
Reputation: 1388
Here is another one using regex, but (as Henry Ecker's pandas answer) its assumes that there is no interfering regex special character in any of the 'no_interest' elements
import regex as re
pattern = re.compile("|".join(no_interest))
out = [c for c in companies if ((pattern.search(c[0]) == None) and (pattern.search(c[1]) == None))]
Upvotes: 0
Reputation: 35626
Why is the 'AND' statement acting like a 'OR'?
See: DeMorgan's Laws
How can we make this code more pythonic and more efficient?
More pythonic:
One options is to use all on a separate list comprehension:
companies = [['zmpEVqsbCUO1aXStxHkSVA', 'palms-car-wash'],
['5T0vKfIJWP1xTnxA7fJ17w', 'meat-and-bread'],
['C0d5kzUx6C19mLcxQyhxCA', 'alamo-drafthouse-cinema-'],
['ch1ercqwoNLpQLxpTb90KQ', 'boston-tea-stop']]
no_interest = ['museum', 'cinema', 'car']
out = [x for x in companies if all([ni not in x[1] for ni in no_interest])]
print(out)
Or with not
any:
out = [x for x in companies if not any([ni in x[1] for ni in no_interest])]
[['5T0vKfIJWP1xTnxA7fJ17w', 'meat-and-bread'],
['ch1ercqwoNLpQLxpTb90KQ', 'boston-tea-stop']]
More efficient:
Use a library like pandas:
import pandas as pd
companies = [['zmpEVqsbCUO1aXStxHkSVA', 'palms-car-wash'],
['5T0vKfIJWP1xTnxA7fJ17w', 'meat-and-bread'],
['C0d5kzUx6C19mLcxQyhxCA', 'alamo-drafthouse-cinema-'],
['ch1ercqwoNLpQLxpTb90KQ', 'boston-tea-stop']]
df = pd.DataFrame(data=companies, columns=['id', 'val'])
no_interest = ['museum', 'cinema', 'car']
out = df[~df['val'].str.contains('|'.join(no_interest))]
print(out)
Output as DataFrame
id val
1 5T0vKfIJWP1xTnxA7fJ17w meat-and-bread
3 ch1ercqwoNLpQLxpTb90KQ boston-tea-stop
Output as list
print(out.to_numpy().tolist())
[['5T0vKfIJWP1xTnxA7fJ17w', 'meat-and-bread'],
['ch1ercqwoNLpQLxpTb90KQ', 'boston-tea-stop']]
Upvotes: 3