Reputation: 99
I read contents from excel file using pandas::
import pandas as pd
df = pd.read_excel("FAM_template_Update 1911274_JS.xlsx" )
df
While trying to extract entities using spacy::
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(df)
for enitity in doc.ents:
print((entity.text))
Got Error:: TypeError: Argument 'string' has incorrect type (expected str, got DataFrame)
On line(3)-----> doc = nlp(df)
Upvotes: 2
Views: 3854
Reputation: 18367
This is expected as Spacy
is not prepared to deal with a dataframe as-is. You need to do some work before being able to print the entities. Start by identifying the column that contains the text you want to use nlp
on. After that, extract its value as list, and now you're ready to go. Let's suppose the column name that contains the text is named Text
.
for i in df['Question'].tolist():
doc = nlp(i)
for entity in doc.ents:
print((entity.text))
This will iterate over each text (row) for in your dataframe and print the entities.
Upvotes: 5
Reputation: 1896
You need to loop through the individual strings within your dataframe. The NLP parser and entity extraction is expecting a string.
For example:
for row in range(len(df)):
doc = nlp(df.loc[row, "text_column"])
for enitity in doc.ents:
print((entity.text))
Upvotes: 0