Reputation: 711
list1 = ['Google', 'Stanford University', 'Karlsruhe Institute of Technology (KIT) / University of Karlsruhe (TH)', 'AU-KBC Research Centre']
exclusion_list = ['university','institute','school','University','Institute','School']
output=['Google','AU-KBC Research Centre']
The output should have only those elements which do not contain the words(elements) from the exclusion_list
I have searched all of SO. But, none of them give an answer to specifically this problem. I have tried using filter()
and also a dumb function. But, I am looking for a non-brute-force solution. Also, I am curious about an extra solution which uses regex to solve the problem.
I am looking for an optimal way to remove any sort of university, school or institute from the list1
EDIT: I want to preserve the order of the list as well. I apologize for not making this point clear.
Upvotes: 0
Views: 871
Reputation: 88226
For an efficient solution you might want to use sets
here. Define exclusion_list
as a set
, and use a list comprehension to check membership on each of the words in list1
:
list1 = ['Google', 'Stanford University',
'Karlsruhe Institute of Technology (KIT) / University of Karlsruhe (TH)',
'AU-KBC Research Centre']
# define a set from the exclusion_list
exclusion_list = set(['university','institute','school','University','Institute','School'])
[i for i in list1 if not set(i.split()).intersection(exclusion_list)]
# ['Google', 'AU-KBC Research Centre']
Upvotes: 4
Reputation: 627
Try this:
[name for name in list1 if not any(x.lower() in name.lower() for x in set(exclusion_list)) ]
Upvotes: 1
Reputation: 5180
Try this.
list1 = ['Google', 'Stanford University', 'Karlsruhe Institute of Technology (KIT) / University of Karlsruhe (TH)', 'AU-KBC Research Centre']
exclusion_list = ['university','institute','school','University','Institute','School']
exclusion_list = [i.lower() for i in exclusion_list]
for i in list1:
if not any(map(lambda x:x in i.lower(), exclusion_list)):
print(i)
Upvotes: 1
Reputation: 2584
One-liner:
[s for s in list1 if not any(e in s.lower() for e in exclusion_list)]
Similarly possible with filter as you mentioned:
list(filter(lambda s: not any(e in s.lower() for e in exclusion_list), list1))
Upvotes: 1
Reputation: 277
import pandas as pd
k = pd.Series(['Google', 'Stanford University', 'Karlsruhe Institute of Technology (KIT) / University of Karlsruhe (TH)', 'AU-KBC Research Centre'])
k[~k.str.contains('|'.join(exclusion_list))].tolist()
Upvotes: 1
Reputation: 412
We can do something like follows:
out = []
excl = set(exclusion_list)
for item in list1:
for word in item.split():
if word in excl:
break
else:
out.append(item)
Upvotes: 1