ellgren
ellgren

Reputation: 1

Webscraping via BeautifulSoup

The following error arises - should produce a company name - any ideas based on looking to scrape the following tag:

<span datatype="xsd:string" property="gazorg:name">ISCA SCAFFOLD LIMITED </span>

from the following code:

import requests
from bs4 import BeautifulSoup
data = requests.get('https://www.thegazette.co.uk/notice/3188283')
data.text[:1000]
soup = BeautifulSoup(data.text, 'html.parser')
soup.prettify()[:1000]
span = soup.find('span', {'property' : 'gazorg:name'})
company = span.text

Error:

AttributeError                            Traceback (most recent call last)
<ipython-input-7-4449f0e20d72> in <module>
----> 1 company = span.text
AttributeError: 'NoneType' object has no attribute 'text'`enter code here`

Upvotes: 0

Views: 50

Answers (1)

Bitto
Bitto

Reputation: 8205

You are getting that error because you have not set your User-Agent. Websites may choose to give different response based on the User-Agent. Some websites may not give a valid response if the User-Agent is missing.

It is advised to set the User-Agent similar to the one you used while inspecting the site.

import requests
from bs4 import BeautifulSoup
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
data = requests.get('https://www.thegazette.co.uk/notice/3188283',headers=headers)
soup = BeautifulSoup(data.text, 'html.parser')
span = soup.find('span', {'property' : 'gazorg:name'})
company = span.text
print(company)

Output

ISCA SCAFFOLD LIMITED 

Upvotes: 1

Related Questions