Reputation: 55
I am trying to pull the question and answer section from Lazada through web scraping, however I am having issue when some of the pages doesn't have any question/answer. My code returns nothing when i run it for multiple web pages but works only for one page that have question and answer.
How do i make the code continue reading the rest of web pages though the first page have no question?
I have tried adding if else statement in my code as shown below.
import bleach
import csv
import datetime
from bs4 import BeautifulSoup
urls = ['url1','url2','url3']
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
now = datetime.datetime.now()
print ("Date data being pulled:")
print str(now)
print ("")
nameList = soup.findAll("div", {"class":"qna-content"})
for name in nameList:
if nameList == None:
print('None')
else:
print(name.get_text())
continue
my expected output will be something like as shown below :
None --> output from url1 None --> output from url2
can choose huzelnut? Hi Dear Customer , for the latest expiry date its on 2019 , and we will make sure the expiry date is still more than 6 months.--> output from url3
I appreciate your help, thanks in advance!
Upvotes: 1
Views: 175
Reputation: 19184
you have wrong syntax, put if nameList == None:
outside the loop, also you need to fix the indentation.
urls = ['url1','url2','url3']
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
now = datetime.datetime.now()
print ("Date data being pulled:")
print str(now)
print ("")
nameList = soup.findAll("div", {"class":"qna-content"})
if nameList == None:
print(url, 'None')
continue # skip this URL
for name in nameList:
print(name.get_text())
Upvotes: 1
Reputation: 55
I did some changes to the logic of the code and manage to print the record for now, since I am still learning, hope to get sharing for others as well if you have alternative/better solution.
import datetime
from bs4 import BeautifulSoup
import requests
urls = ['url1','url2','url3']
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
now = datetime.datetime.now()
print ("Date data being pulled:")
print str(now)
print ("")
qna = []
qna = soup.findAll("div", class_= "qna-content")
for qnaqna in qna:
if not qnaqna:
print('List is empty')
else:
print(qnaqna.get_text())
continue
Upvotes: 1