Reputation: 27
I am trying to extract more than one URL and description from a website to download pictures from it, and i was able to get it to output the first link and description, but i am confused where i should go from here to get the rest of the sets. My code to get the first link is below.
import requests
from bs4 import BeautifulSoup
import urllib
import csv
import time
URL = 'https://www.baps.org/vicharan'
content = requests.get(URL)
soup=BeautifulSoup(content.text, 'html.parser')
f = csv.writer(open('crawler.csv' , 'w'))
f.writerow(['description' , 'full_link'])
panelrow = soup.find('div' , {'class' : 'panelrow'})
main_class = panelrow.find_all('div' , {'class' : 'col-xl-3 col-lg-3 col-md-3 col-sm-12 col-xs-12 padding5'})
individual_classes = panelrow.find('a' , {'class' : 'highslidooo'})
for link in individual_classes.find_all('img'):
links=link.get('src')
full_link = 'https://www.baps.org' + links
description = link.get('alt')
f.writerow([description , full_link])
print('--------------------')
print(full_link)
print(description)
Upvotes: 0
Views: 46
Reputation: 11903
I think your issue is that you are not using find_all
and find
correctly. Take a look at the documentation for them and some examples.
find
finds the FIRST match only. So your individual_classes
variable will only have one link in it
find_all
gets a container of ALL of the matches, which you can iterate.
So one recommendation is to look at the length of your results to see how they are doing along the way. try:
len(panelrow)
len(main_class)
# etc...
and see how it is going. sometimes a good pattern is to find a collection of outer containers with find_all
and then use find
within that to get the first result or such.... In a nested fashion. something like this will probably work for you:
groups = soup.find_all(...)
for group in groups:
link = group.find(...)
Upvotes: 1