Bhavya Patel
Bhavya Patel

Reputation: 27

Using find_all in BeautifulSoup

I am trying to extract more than one URL and description from a website to download pictures from it, and i was able to get it to output the first link and description, but i am confused where i should go from here to get the rest of the sets. My code to get the first link is below.

import requests
from bs4 import BeautifulSoup
import urllib
import csv
import time



URL = 'https://www.baps.org/vicharan'
content = requests.get(URL)

soup=BeautifulSoup(content.text, 'html.parser')


f = csv.writer(open('crawler.csv' , 'w'))
f.writerow(['description' , 'full_link'])

panelrow = soup.find('div' , {'class' : 'panelrow'})


main_class =  panelrow.find_all('div' , {'class' : 'col-xl-3 col-lg-3 col-md-3 col-sm-12 col-xs-12 padding5'})

individual_classes = panelrow.find('a' , {'class' : 'highslidooo'})

for link in individual_classes.find_all('img'):
    links=link.get('src')
    full_link = 'https://www.baps.org' + links
    description = link.get('alt')
    f.writerow([description , full_link])


print('--------------------')
print(full_link)
print(description)

Upvotes: 0

Views: 46

Answers (1)

AirSquid
AirSquid

Reputation: 11903

I think your issue is that you are not using find_all and find correctly. Take a look at the documentation for them and some examples.

find finds the FIRST match only. So your individual_classes variable will only have one link in it

find_all gets a container of ALL of the matches, which you can iterate.

So one recommendation is to look at the length of your results to see how they are doing along the way. try:

len(panelrow)
len(main_class)
# etc...

and see how it is going. sometimes a good pattern is to find a collection of outer containers with find_all and then use find within that to get the first result or such.... In a nested fashion. something like this will probably work for you:

groups = soup.find_all(...)
for group in groups:
    link = group.find(...)

Upvotes: 1

Related Questions