Stop Words not being removed from list

I am doing some analysis on a large corpus, but my function to remove custom stop words is just not working. I tried several different solutions from questions already asked here, but I can't find why words are not being removed from the Test list.

Any help pointing out my stupidness is welcome.

test = [['acesso em',
  'agindo',
  'alegre',
  'ambiente escolar',
  'ambientes digitais',
  'anual',
  'aplicativos digitais',
  'apresentar conclusões',
  'argumentação cuidado',
  'articuladas projeto',
  'associadas eixos',
  'associação',
  'ativas',
  'atos linguagem',
  'avaliar oportunidades',
  'bairro',
  'base critérios',
  'base estudos',
  'bibliográfica exploratória',
  'blogs',
  'buscando apresentar',
  'campo artístico']]
removed = ['anual']
new_words = [word for word in test if word not in removed]
new_words

Upvotes: 1

Views: 906

Answers (3)

rickhg12hs
rickhg12hs

Reputation: 11912

Your dataframe seems a bit strange, but if you want to create a new list of lists with some stop words removed, this may work for you.

new_words = [[word for word in mywords if word not in removed] for mywords in test]

Try it on Binder.

Upvotes: 0

Omar The Dev
Omar The Dev

Reputation: 123

I see, maybe the function is not working properly, so you can use the following code just add it and set everything to work properly yourself it would be easy.

words = ['a', 'b', 'a', 'c', 'd']
stopwords = ['a', 'c']
for word in list(words):  # iterating on a copy since removing will mess things up
    if word in stopwords:
        words.remove(word)

Upvotes: 1

user9613901
user9613901

Reputation:

This program works in python compiler.

But in jupyter you have to change the following:

remove one of [ and ] from source data:

test = ['acesso em',
'agindo',
'alegre',
'ambiente escolar',
'ambientes digitais',
'anual',
'aplicativos digitais',
'apresentar conclusões',
'argumentação cuidado',
'articuladas projeto',
'associadas eixos',
'associação',
'ativas',
'atos linguagem',
'avaliar oportunidades',
'bairro',
'base critérios',
'base estudos',
'bibliográfica exploratória',
'blogs',
'buscando apresentar',
'campo artístico']
removed = ['anual']
new_words = [word for word in test if word  not in removed]
new_words

Output:

 ['acesso em',
 'agindo',
 'alegre',
 'ambiente escolar',
 'ambientes digitais',
 'aplicativos digitais',
 'apresentar conclusões',
 'argumentação cuidado',
 'articuladas projeto',
 'associadas eixos',
 'associação',
 'ativas',
 'atos linguagem',
 'avaliar oportunidades',
 'bairro',
 'base critérios',
 'base estudos',
 'bibliográfica exploratória',
 'blogs',
 'buscando apresentar',
 'campo artístico']

And if you want to access the list elements inside another list, you can enter its index:

new_words = [word for word in test[0] if word  not in removed]

In this case, there is no need to delete [].

Upvotes: 0

Related Questions