DonCheiron
DonCheiron

Reputation: 3

BS4: AttributeError: 'NoneType' object has no attribute 'text'

I encounter an issue while trying to scrape a certain job-posting website. First, my urls are in a CSV file "urls.csv"

Usually the code runs fine, but from time to time I am getting this error: "AttributeError: 'NoneType' object has no attribute 'text'", sometime after 1 iteration, sometimes after 30. And if the issue was with let's say i=230, if I run it again it parses that url fine, and stops again after some iterations.

Can someone advise please? Thank you!

Also, the error occurs on line textoffer = ......

Edit: Link to the csv: https://github.com/DonCheiron/Scraping-Be.Indeed/blob/master/urls.csv

import bs4 as bs
import urllib.request
import csv

with open('C:/Users/******/Desktop/urls.csv', 'r') as f:
    reader = csv.reader(f)
    pages = list(reader)
    for i in range (0,300):
        page = ''.join(map(str, pages[i]))
        print('Working on ' + str(i)+ "...")
        sauce = urllib.request.urlopen(page).read()
        soup =bs.BeautifulSoup(sauce,'lxml')
        textoffer = soup.body.div.find('div',class_='jobsearch-JobComponent-description icl-u-xs-mt--md').text
        file = open(str(i)+ '.txt','w')
        file.write(textoffer)
        file.close()
        print(str(i) + " Done!")

Upvotes: 0

Views: 952

Answers (1)

gregory
gregory

Reputation: 12925

Using a few random urls that you supplied, I try:

with open('urls.csv', 'r') as f:
    reader = csv.reader(f)
    pages = list(reader)
for counter, url in enumerate(pages):
    print(counter, ''.join(url))
    page_response = requests.get(''.join(url))
    print(page_response)
    soup = BeautifulSoup(page_response.content, 'html.parser')
    print(soup.body.div.find('div',class_='jobsearch-JobComponent-description icl-u-xs-mt--md')).text

output:

0 https://be.indeed.com/rc/clk?jk=39582947a2d91970&fccid=adb55a49f6636f0e&vjs=3
<Response [200]>

None
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-511-2b829cd9fc45> in <module>()
      4     print(page_response)
      5     soup = BeautifulSoup(page_response.content, 'html.parser')
----> 6     print(soup.body.div.find('div',class_='jobsearch-JobComponent-description icl-u-xs-mt--md')).text
      7
      8

AttributeError: 'NoneType' object has no attribute 'text'

Traceback is pretty clear in showing you that trying to convert the find into text when there isn't anything found is a problem. As to why the same url would only sometimes have this class, it is either not the same url or a dynamic page which doesn't always contain the same elements.

Upvotes: 1

Related Questions