kely789456123
kely789456123

Reputation: 595

Got Argument 'other' has incorrect type (expected spacy.tokens.token.Token, got str)

I was getting the following error while i was trying to read a list in spacy.

TypeError: Argument 'string' has incorrect type (expected spacy.tokens.token.Token, got str)

Here is the code below

f= "MotsVides.txt"
file= open(f, 'r', encoding='utf-8')
stopwords = [line.rstrip() for line in file]

# stopwords =['alors', 'au', 'aucun', 'aussi', 'autre', 'avant', 'avec', 'avoir', 'bon', 'car', 'ce', 'cela', 'ces', 'ceux', 'chaque', 'ci', 'comme', 'comment', 'ça', 'dans', 'des', 'du', 'dedans', 'dehors', 'depuis', 'deux', 'devrait', 'doit', 'donc', 'dos', 'droite', 'début', 'elle', 'elles', 'en', 'encore', 'essai', 'est', 'et', 'eu', 'étaient', 'état', 'étions', 'été', 'être', 'fait', 'faites', 'fois', 'font', 'force', 'haut', 'hors', 'ici', 'il', 'ils', 's', 'juste', 'la', 'le', 'les', 'leur', 'là\t ma', 'maintenant', 'mais', 'mes', 'mine', 'moins', 'mon', 'mot', 'même', 'ni', 'nommés', 'notre', 'nous', 'nouveaux', 'ou', 'où', 'par', 'parce', 'parole', 'pas', 'personnes', 'peut', 'peu', 'pièce', 'plupart', 'pour', 'pourquoi', 'quand', 'que', 'quel', 'quelle', 'quelles', 'quels', 'qui\t sa', 'sans', 'ses', 'seulement', 'si', 'sien', 'son', 'sont', 'sous', 'soyez', 'sujet', 'sur', 'ta', 'tandis', 'tellement', 'tels', 'tes', 'ton', 'tous', 'tout', 'trop', 'très', 'tu', 'valeur', 'voie', 'voient', 'vont', 'votre', 'vous', 'vu']


def spacy_process(texte):
    for lt in texte:
        mytokens = nlp(lt)
        print(mytokens)
        mytokens2 = [word.lemma_.lower().strip() for word in mytokens if word.pos_ != "PUNCT" and word not in stopwords]
    
    print(type(mytokens2))
    
a = ['je suis la bonne personne et droit à la caricature.', 'Je suis la bonne personne et droit à la caricature.']
spacy_process(a)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-133-03cc18018278> in <module>
     33 
     34 a = ['je suis la bonne personne et droit à la caricature.', 'Je suis la bonne personne et droit à la caricature.']
---> 35 spacy_process(a)

<ipython-input-133-03cc18018278> in spacy_process(texte)
     28         mytokens = nlp(lt)
     29         print(mytokens)
---> 30         mytokens2 = [word.lemma_.lower().strip() for word in mytokens if word.pos_ != "PUNCT" and word not in stopwords]
     31 
     32     print(type(mytokens2))

<ipython-input-133-03cc18018278> in <listcomp>(.0)
     28         mytokens = nlp(lt)
     29         print(mytokens)
---> 30         mytokens2 = [word.lemma_.lower().strip() for word in mytokens if word.pos_ != "PUNCT" and word not in stopwords]
     31 
     32     print(type(mytokens2))

TypeError: Argument 'other' has incorrect typ (expected spacy.tokens.token.Token, got str)

Upvotes: 5

Views: 6592

Answers (1)

bivouac0
bivouac0

Reputation: 2560

The issue is that word from word not in stopwords is a Token not a string. Python is complaining because it's trying to search and do comparisons between a list of strings and the Token class which doesn't work.

With spacy you want to use word.text to get the string, not word.

The following code should work...

import spacy
stopwords = ['alors', 'au', 'aucun', 'aussi', 'autre'] # truncated for simplicity

nlp = spacy.load('en')
def spacy_process(texte):
    for lt in texte:
        mytokens = nlp(lt)
        mytokens2 = [word.lemma_.lower().strip() for word in mytokens if word.pos_ != "PUNCT" and word.text not in stopwords]
    print(mytokens2)

a = ['je suis la bonne personne et droit à la caricature.', 'Je suis la bonne personne et droit à la caricature.']
spacy_process(a)

BTW... Checking for a value in a list is fairly slow. You should convert your list to a set to speed things up.

Upvotes: 12

Related Questions