Extract entities from spacy

Question

I have a python file for webscraping : scrapper.py::

from bs4 import BeautifulSoup
import requests
source = requests.get('https://en.wikipedia.org/wiki/Willis').text
soup = BeautifulSoup(source,'lxml')

def my_function():

    heading = soup.find('h1',{'id':'firstHeading'}).text
    print(heading)
    print()
for item in soup.select("#mw-content-text"):
        required_data = [p_item.text for p_item in item.select("p")][1:3]
        print('
'.join(required_data).encode('utf-8'))

    Willis= soup.find("caption",{"class":"fn org"}).text
    print(Willis)
    print()

I want to use spacy to extract entities from scrapper.py :: pyspacy.py

import spacy
import scrapper

entity_list = []

nlp = spacy.load("en_core_web_sm")


doc = nlp(scrapper.my_function())

for entity in doc.ents:
    entity_list.append((entity.text, entity.label_))
print(entity_list)

It just gives me the output:: in terminal for the scraped data along with error::

** 
Traceback (most recent call last):
  File "hakuna_spacy.py", line 12, in 
    doc = nlp(printwo.pubb())
  File "C:\Users\Hp\AppData\Local\Programs\Python\Python37\lib\site-packages\spacy\language.py", 
line 423, in __call__
    if len(text) > self.max_length:

TypeError: object of type 'NoneType' has no len()

 **

What is that I'm doing wrong? Can someone explain me please?

Extract entities from spacy

Answers (1)

Related Questions