Reputation: 849
I am new to Python. I just started yesterday. I want to scrape a website and collect data in the dictionary. All the imports are added at the beginning of python script
title_and_urls = {} #dictionary
totalNumberOfPages = 12
for x in range(1,int(totalNumberOfPages)+1):
url_pages = 'https://abd.com/api?&page=' +str(x)+'&year=2017'
resp = requests.get(url_pages, timeout=60)
soup = BeautifulSoup(resp.text, 'lxml')
for div in soup.find_all('div', {"class": "block2"}):
a = div.find('a')
h3 = a.find('h3')
print(h3,url_pages) #prints correct
title_and_urls[h3.text] = base_enthu_url+a.attrs['href']
print(title_and_urls)
with open('dict.csv', 'wb') as csv_file:
writer = csv.writer(csv_file)
for key, value in title_and_urls.items():
writer.writerow([key, value])
There are a few issues here
1. I have total 12 pages but it skipped pages 7 and 8
2. The print line print(h3,url_pages)
printed 60 items while csv file only has 36.
I appreciate all the help and explanation. Please suggest best practice
Upvotes: 0
Views: 51
Reputation: 133
use try function
**title_and_urls = {} #dictionary
totalNumberOfPages = 12
for x in range(1,int(totalNumberOfPages)+1):
try:
url_pages = 'https://abd.com/api?&page=' +str(x)+'&year=2017'
resp = requests.get(url_pages, timeout=60)
soup = BeautifulSoup(resp.text, 'lxml')
for div in soup.find_all('div', {"class": "block2"}):
a = div.find('a')
h3 = a.find('h3')
print(h3,url_pages) #prints correct
title_and_urls[h3.text] = base_enthu_url+a.attrs['href']
except:
pass
with open('dict.csv', 'wb') as csv_file:
writer = csv.writer(csv_file)
for key, value in title_and_urls.items():
writer.writerow([key, value])**
Upvotes: 1