Sssssuppp
Sssssuppp

Reputation: 711

How to check if element in list matches with another element in a list of strings?

Input lists

list1 = ['Google', 'Stanford University', 'Karlsruhe Institute of Technology (KIT) / University of Karlsruhe (TH)', 'AU-KBC Research Centre']
exclusion_list = ['university','institute','school','University','Institute','School']

Output list

output=['Google','AU-KBC Research Centre']

The output should have only those elements which do not contain the words(elements) from the exclusion_list I have searched all of SO. But, none of them give an answer to specifically this problem. I have tried using filter() and also a dumb function. But, I am looking for a non-brute-force solution. Also, I am curious about an extra solution which uses regex to solve the problem.

Basically

I am looking for an optimal way to remove any sort of university, school or institute from the list1

EDIT: I want to preserve the order of the list as well. I apologize for not making this point clear.

Upvotes: 0

Views: 871

Answers (6)

yatu
yatu

Reputation: 88226

For an efficient solution you might want to use sets here. Define exclusion_list as a set, and use a list comprehension to check membership on each of the words in list1:

list1 = ['Google', 'Stanford University',
         'Karlsruhe Institute of Technology (KIT) / University of Karlsruhe (TH)', 
         'AU-KBC Research Centre']

# define a set from the exclusion_list
exclusion_list = set(['university','institute','school','University','Institute','School'])

[i for i in list1 if not set(i.split()).intersection(exclusion_list)]
# ['Google', 'AU-KBC Research Centre']

Upvotes: 4

Faizan Naseer
Faizan Naseer

Reputation: 627

Try this:

[name  for name in list1 if not any(x.lower() in name.lower() for x in set(exclusion_list)) ]

Upvotes: 1

Underoos
Underoos

Reputation: 5180

Try this.

list1 = ['Google', 'Stanford University', 'Karlsruhe Institute of Technology (KIT) / University of Karlsruhe (TH)', 'AU-KBC Research Centre']
exclusion_list = ['university','institute','school','University','Institute','School']
exclusion_list = [i.lower() for i in exclusion_list]
for i in list1:
    if not any(map(lambda x:x in i.lower(), exclusion_list)):
        print(i)

Upvotes: 1

Yuri Feldman
Yuri Feldman

Reputation: 2584

One-liner:

[s for s in list1 if not any(e in s.lower() for e in exclusion_list)]

Similarly possible with filter as you mentioned:

list(filter(lambda s: not any(e in s.lower() for e in exclusion_list), list1))

Upvotes: 1

Aditya Shrivastava
Aditya Shrivastava

Reputation: 277

import pandas as pd
k = pd.Series(['Google', 'Stanford University', 'Karlsruhe Institute of Technology (KIT) / University of Karlsruhe (TH)', 'AU-KBC Research Centre'])
k[~k.str.contains('|'.join(exclusion_list))].tolist()    

Upvotes: 1

Prasanth
Prasanth

Reputation: 412

We can do something like follows:

out = []
excl = set(exclusion_list)
for item in list1:
    for word in item.split():
        if word in excl:
            break
    else:
        out.append(item)

Upvotes: 1

Related Questions