Reputation: 33
I have two lists such as the examples below (in reality, a
is longer) and I would like to remove all common elements, in this case the punctuation given in list punctuation
.
a = [['A', 'man,', 'view,', 'becomes', 'mankind', ';', 'mankind', 'member', 'comical', 'family', 'Intelligences', '.'],['Jeans', 'lengthen', 'legs', ',', 'hug', 'hips', ',', 'turn', 'heads', '.']]
punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"]
Upvotes: 0
Views: 2434
Reputation: 4421
When the order is not important:
You can do a set()
operation on it, but first you have to flatten the nested list a
(taken from Making a flat list out of list of lists in Python):
b = [item for sublist in a for item in sublist]
cleaned = list(set(b) - set(punctuation))
cleaned
is a list that looks like ['A', 'hug', 'heads', 'family', 'Intelligences', 'becomes', 'Jeans', 'lengthen', 'member', 'turn', 'mankind', 'view,', 'legs', 'man,', 'hips', 'comical']
When the order is important:
Simply a list comprehension, which is probably slower
cleaned = [x for x in b if x not in punctuation]
cleaned
looks like ['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences', 'Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']
Upvotes: 0
Reputation: 103784
You can do:
>>> from itertools import chain
>>> filter(lambda e: e not in punctuation, chain(*a))
['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences', 'Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']
Or, if you want to maintain you sublist structure:
>>> [filter(lambda e: e not in punctuation, sub) for sub in a]
[['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences'], ['Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']]
Upvotes: 0
Reputation: 1142
You can do this, but the list order might change.
[list(set(sublist)-set(punctuation)) for sublist in a]
Using sets, you can remove the punctuation entries, and cast the result to a list again. Use list comprehension to do it for each sublist in the list.
If keeping the order is important, you can do this:
[[x for x in sublist if not (x in punctuation)] for sublist in a]
Upvotes: 0
Reputation: 6606
Make a set of words to remove and test containment item by item if you need to preserve order.
cleaned = [word for word in words if word not in blacklist]
Upvotes: 1