Reputation: 3
I encounter an issue while trying to scrape a certain job-posting website. First, my urls are in a CSV file "urls.csv"
Usually the code runs fine, but from time to time I am getting this error: "AttributeError: 'NoneType' object has no attribute 'text'", sometime after 1 iteration, sometimes after 30. And if the issue was with let's say i=230, if I run it again it parses that url fine, and stops again after some iterations.
Can someone advise please? Thank you!
Also, the error occurs on line textoffer = ......
Edit: Link to the csv: https://github.com/DonCheiron/Scraping-Be.Indeed/blob/master/urls.csv
import bs4 as bs
import urllib.request
import csv
with open('C:/Users/******/Desktop/urls.csv', 'r') as f:
reader = csv.reader(f)
pages = list(reader)
for i in range (0,300):
page = ''.join(map(str, pages[i]))
print('Working on ' + str(i)+ "...")
sauce = urllib.request.urlopen(page).read()
soup =bs.BeautifulSoup(sauce,'lxml')
textoffer = soup.body.div.find('div',class_='jobsearch-JobComponent-description icl-u-xs-mt--md').text
file = open(str(i)+ '.txt','w')
file.write(textoffer)
file.close()
print(str(i) + " Done!")
Upvotes: 0
Views: 952
Reputation: 12925
Using a few random urls that you supplied, I try:
with open('urls.csv', 'r') as f:
reader = csv.reader(f)
pages = list(reader)
for counter, url in enumerate(pages):
print(counter, ''.join(url))
page_response = requests.get(''.join(url))
print(page_response)
soup = BeautifulSoup(page_response.content, 'html.parser')
print(soup.body.div.find('div',class_='jobsearch-JobComponent-description icl-u-xs-mt--md')).text
output:
0 https://be.indeed.com/rc/clk?jk=39582947a2d91970&fccid=adb55a49f6636f0e&vjs=3
<Response [200]>
None
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-511-2b829cd9fc45> in <module>()
4 print(page_response)
5 soup = BeautifulSoup(page_response.content, 'html.parser')
----> 6 print(soup.body.div.find('div',class_='jobsearch-JobComponent-description icl-u-xs-mt--md')).text
7
8
AttributeError: 'NoneType' object has no attribute 'text'
Traceback is pretty clear in showing you that trying to convert the find
into text
when there isn't anything found is a problem. As to why the same url would only sometimes have this class, it is either not the same url or a dynamic page which doesn't always contain the same elements.
Upvotes: 1