Jay Jung
Jay Jung

Reputation: 1885

How to create a list of words as long as the word is not a word within a list of tuples

I have some words:

wordlist = ['change', 'my', 'diaper', 'please']

I also have a list of tuples that I need to check against:

mylist = [('verb', 'change'), ('prep', 'my')]

What I want to do is create a list out of all the words that are not in the list of tuples.

So the result of this example would be ['diaper', 'please']

What I tried seems to create duplicates:

[word for tuple in mylist for word in wordlist if word not in tuple]

How do I generate a list of the words not in the tuple-list, and do it as efficiently as possible?

No use of sets.

Edit: chose answer based on following restriction of set

Upvotes: 2

Views: 79

Answers (4)

Thierry Lathuille
Thierry Lathuille

Reputation: 24282

Make a set of known words from your tuples list:

myList = [('verb', 'change'), ('prep', 'my')]
known_words = set(tup[1] for tup in myList)

then use it as you did before:

wordlist = ['change', 'my', 'diaper', 'please']
out = [word  for word in wordlist if word not in known_words]

print(out)
# ['diaper', 'please']

Checking if an item exists in a set is O(1), while checking in a list or tuple is O(length of the list), so it is really worth using sets in such cases.

Also, if you don't care about the order of the words and want to remove duplicates, you could do:

unique_new_words = set(wordlist) - known_words
print(unique_new_words)
# {'diaper', 'please'}

Upvotes: 2

Radan
Radan

Reputation: 1650

I have made a an assumption, that tuple[1] would have only one element, if not that would need a small change.

[word for word in wordlist if word not in [tuple[1] for tuple in mylist]]

Upvotes: 1

bracco23
bracco23

Reputation: 2221

Here is a oneliner using list comprehension

[word for word in wordlist if word not in [ w[1] for w in mylist ]]

The inner list, [ w[1] for w in mylist ] extracts the second element from the tuple list.

The outer list, [word for word in wordlist if word not in innerlist] extracts the words filtering out the ones in the just extracted list.

P.S. I assumed you wanted to filter only the second element of the tuple list.

Upvotes: 2

hiro protagonist
hiro protagonist

Reputation: 46899

this is a version where i flatten (using itertools.chain) your tuples into a set and compare against that set (using a set will speed up the lookup for the in operator):

from itertools import chain

wordlist = ['change', 'my', 'diaper', 'please']
mylist = [('verb', 'change'), ('prep', 'my')]
veto = set(chain(*mylist))   # {'prep', 'change', 'verb', 'my'}

print([word for word in wordlist if word not in veto])
# ['diaper', 'please']

Upvotes: 1

Related Questions