William Jungerman
William Jungerman

Reputation: 33

How do I remove common elements from two lists?

I have two lists such as the examples below (in reality, a is longer) and I would like to remove all common elements, in this case the punctuation given in list punctuation.

a = [['A', 'man,', 'view,', 'becomes', 'mankind', ';', 'mankind', 'member', 'comical', 'family', 'Intelligences', '.'],['Jeans', 'lengthen', 'legs', ',', 'hug', 'hips', ',', 'turn', 'heads', '.']]
punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"]

Upvotes: 0

Views: 2434

Answers (4)

MERose
MERose

Reputation: 4421

When the order is not important:

You can do a set() operation on it, but first you have to flatten the nested list a (taken from Making a flat list out of list of lists in Python):

b = [item for sublist in a for item in sublist]
cleaned = list(set(b) - set(punctuation))

cleaned is a list that looks like ['A', 'hug', 'heads', 'family', 'Intelligences', 'becomes', 'Jeans', 'lengthen', 'member', 'turn', 'mankind', 'view,', 'legs', 'man,', 'hips', 'comical']

When the order is important:

Simply a list comprehension, which is probably slower

cleaned = [x for x in b if x not in punctuation]

cleaned looks like ['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences', 'Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']

Upvotes: 0

dawg
dawg

Reputation: 103784

You can do:

>>> from itertools import chain
>>> filter(lambda e: e not in punctuation, chain(*a))
['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences', 'Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']

Or, if you want to maintain you sublist structure:

>>> [filter(lambda e: e not in punctuation, sub) for sub in a]
[['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences'], ['Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']]

Upvotes: 0

Bastian35022
Bastian35022

Reputation: 1142

You can do this, but the list order might change.

[list(set(sublist)-set(punctuation)) for sublist in a]

Using sets, you can remove the punctuation entries, and cast the result to a list again. Use list comprehension to do it for each sublist in the list.


If keeping the order is important, you can do this:

[[x for x in sublist if not (x in punctuation)] for sublist in a]

Upvotes: 0

jwilner
jwilner

Reputation: 6606

Make a set of words to remove and test containment item by item if you need to preserve order.

cleaned = [word for word in words if word not in blacklist] 

Upvotes: 1

Related Questions