Reputation: 567
I would like to know how I can remove specific words, including stopwords, from a list of list like this:
my_list=[[],
[],
['A'],
['SB'],
[],
['NMR'],
[],
['ISSN'],
[],
[],
[],
['OF', 'USA'],
[],
['THE'],
['HOME'],
[],
[],
['STAR'],
[]]
If it was a list of strings, I would have applied something like the following:
from collections import Counter
stop_words = stopwords.words('english')
text = ' '.join([word for word in my_list if word not in stop_words])
I would need to plot it at the end doing something like this:
counts= Counter(chain.from_iterable(my_list))
plt.bar(*zip(*counts.most_common(20)))
plt.show()
Expected list to be plotted:
my_list=[[],
[],
['SB'],
[],
['NMR'],
[],
['ISSN'],
[],
[],
[],
['USA'],
[],
['HOME'],
[],
[],
['STAR'],
[]]
Upvotes: 0
Views: 1009
Reputation: 780724
Loop through my_words
, replacing each nested list with the list with stop words removed. You can use set difference to remove the words.
stop_words = stopwords.words('english')
my_list = [list(set(sublist).difference(stop_words)) for sublist in my_list]
It gets a little more complicated to do the comparisons case insensitively, as you can't use the built-in set difference method.
my_list = [[word for word in sublist if word.lower() not in stop_words] for sublist in my_list]
Upvotes: 3