Ms Nad
Ms Nad

Reputation: 55

How to handle Empty List- multiple page web scraping

I am trying to pull the question and answer section from Lazada through web scraping, however I am having issue when some of the pages doesn't have any question/answer. My code returns nothing when i run it for multiple web pages but works only for one page that have question and answer.

How do i make the code continue reading the rest of web pages though the first page have no question?

I have tried adding if else statement in my code as shown below.

 import bleach
 import csv
 import datetime
 from bs4 import BeautifulSoup

urls = ['url1','url2','url3']

for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

 now = datetime.datetime.now()
 print ("Date data being pulled:")
 print str(now)
 print ("")

 nameList = soup.findAll("div", {"class":"qna-content"})

for name in nameList:
    if nameList == None:
       print('None')
    else:
       print(name.get_text())
       continue

my expected output will be something like as shown below :

None --> output from url1 None --> output from url2
can choose huzelnut? Hi Dear Customer , for the latest expiry date its on 2019 , and we will make sure the expiry date is still more than 6 months.--> output from url3

I appreciate your help, thanks in advance!

Upvotes: 1

Views: 175

Answers (2)

ewwink
ewwink

Reputation: 19184

you have wrong syntax, put if nameList == None: outside the loop, also you need to fix the indentation.

urls = ['url1','url2','url3']

for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

    now = datetime.datetime.now()
    print ("Date data being pulled:")
    print str(now)
    print ("")

    nameList = soup.findAll("div", {"class":"qna-content"})
    if nameList == None:
        print(url, 'None')
        continue # skip this URL

    for name in nameList:
        print(name.get_text())

Upvotes: 1

Ms Nad
Ms Nad

Reputation: 55

I did some changes to the logic of the code and manage to print the record for now, since I am still learning, hope to get sharing for others as well if you have alternative/better solution.

import datetime
from bs4 import BeautifulSoup
import requests

urls = ['url1','url2','url3']

for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

now = datetime.datetime.now()
print ("Date data being pulled:")
print str(now)
print ("")

qna = []
qna = soup.findAll("div", class_= "qna-content")

for qnaqna in qna:
     if not qnaqna:
        print('List is empty')
     else:
        print(qnaqna.get_text())
        continue

Upvotes: 1

Related Questions