Ajax
Ajax

Reputation: 99

Extract entity from dataframe using spacy

enter image description hereI read contents from excel file using pandas::

import pandas as pd
df = pd.read_excel("FAM_template_Update 1911274_JS.xlsx" )
df

While trying to extract entities using spacy::

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(df)
for enitity in doc.ents:
     print((entity.text))

Got Error:: TypeError: Argument 'string' has incorrect type (expected str, got DataFrame)

        On line(3)-----> doc = nlp(df)

Upvotes: 2

Views: 3854

Answers (2)

Celius Stingher
Celius Stingher

Reputation: 18367

This is expected as Spacy is not prepared to deal with a dataframe as-is. You need to do some work before being able to print the entities. Start by identifying the column that contains the text you want to use nlp on. After that, extract its value as list, and now you're ready to go. Let's suppose the column name that contains the text is named Text.

for i in df['Question'].tolist():
    doc = nlp(i)
    for entity in doc.ents:
         print((entity.text))

This will iterate over each text (row) for in your dataframe and print the entities.

Upvotes: 5

sdhaus
sdhaus

Reputation: 1896

You need to loop through the individual strings within your dataframe. The NLP parser and entity extraction is expecting a string.

For example:

for row in range(len(df)):
    doc = nlp(df.loc[row, "text_column"])
    for enitity in doc.ents:
         print((entity.text))

Upvotes: 0

Related Questions