Reputation: 59
I was doing web scraping but i stuck/confused in find() and find_all().
Like where to use find_all, where to user find().
Also, where can i use this methods like in for loop or in ul li list ??
Here is the code i tried
from bs4 import BeautifulSoup
import requests
urls = "https://www.flipkart.com/offers-list/latest-launches?screen=dynamic&pk=themeViews%3DAug19-Latest-launch-Phones%3ADTDealcard~widgetType%3DdealCard~contentType%3Dneo&wid=7.dealCard.OMU_5&otracker=hp_omu_Latest%2BLaunches_5&otracker1=hp_omu_WHITELISTED_neo%2Fmerchandising_Latest%2BLaunches_NA_wc_view-all_5"
source = requests.get(urls)
soup = BeautifulSoup(source.content, 'html.parser')
divs = soup.find_all('div', class_='MDGhAp')
names = divs.find_all('a')
full_name = names.find_all('div', class_='iUmrbN').text
print(full_name)
And got error like this
File "C:/Users/ASUS/Desktop/utube/sunil.py", line 9, in <module>
names = divs.find_all('a')
File "C:\Users\ASUS\AppData\Local\Programs\Python\Python38-32\lib\site-packages\bs4\element.py", line 1601, in __getattr__
raise AttributeError(
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
So can anyone explain where should i use find and find all ??
Upvotes: 5
Views: 12491
Reputation: 663
The find_all() method scans the entire document looking for results, but sometimes you only want to find one result. If you know a document only has one tag, it’s a waste of time to scan the entire document looking for more. Rather than passing in limit=1 every time you call find_all, you can use the find() method ... both next sentences are equivalent :
soup.find_all('title', limit=1)
soup.find('title')
Upvotes: 0
Reputation: 1432
Let us understand with the help of an example: I am trying to get the list of book names on the mentioned website. (https://www.bookdepository.com/bestsellers)
To iterate through all the book related tags at once I use find_all command, subsequently I use find inside each list item to get the title of the book.
Note: find will fetch you the first match (only match in this case) while find_all will produce a list of all matching items, which you can use futher to iterate through.)
from bs4 import BeautifulSoup as bs
import requests
url = "https://www.bookdepository.com/bestsellers"
response = requests.get(url)
Use find_all to go through all book items:
a=soup.find_all("div",class_ = "item-info")
Use find to go through title of each book inside each book item
for i in a:
print(i.find("h3",class_ = "title").get_text())
Upvotes: 0
Reputation: 3010
find()- It just returns the result when the searched element is found in the page.And the return type will be <class 'bs4.element.Tag'>
.
find_all()- It returns all the matches (i.e) it scans the entire document and returns all the results and the return type will be <class 'bs4.element.ResultSet'>
from robobrowser import RoboBrowser
browser = RoboBrowser(history=True)
browser = RoboBrowser(parser='html.parser')
browser.open('http://www.stackoverflow.com')
res=browser.find('h3')
print(type(res),res)
print(" ")
res=browser.find_all('h3')
print(type(res),res)
print(" ")
print("Iterating the Resultset")
print(" ")
for x in range(0,len(res)):
print(x,res[x])
print(" ")
Output:
<class 'bs4.element.Tag'> <h3><a href="https://stackoverflow.com">current community</a>
</h3>
<class 'bs4.element.ResultSet'> [<h3><a href="https://stackoverflow.com">current community</a>
</h3>, <h3>
your communities </h3>, <h3><a href="https://stackexchange.com/sites">more stack exchange communities</a>
</h3>, <h3 class="w90 mx-auto ta-center p-ff-roboto-slab-bold fs-headline2 mb24">Questions are everywhere, answers are on Stack Overflow</h3>, <h3 class="w90 mx-auto ta-center p-ff-roboto-slab-bold fs-headline2 mb24">Learn and grow with Stack Overflow</h3>, <h3 class="mx-auto w90 wmx12 p-ff-roboto-slab-bold fs-headline2 mb24 lg:ta-center">Looking for a job?</h3>]
Iterating the Resultset
0 <h3><a href="https://stackoverflow.com">current community</a>
</h3>
1 <h3>
your communities </h3>
2 <h3><a href="https://stackexchange.com/sites">more stack exchange communities</a>
</h3>
3 <h3 class="w90 mx-auto ta-center p-ff-roboto-slab-bold fs-headline2 mb24">Questions are everywhere, answers are on Stack Overflow</h3>
4 <h3 class="w90 mx-auto ta-center p-ff-roboto-slab-bold fs-headline2 mb24">Learn and grow with Stack Overflow</h3>
5 <h3 class="mx-auto w90 wmx12 p-ff-roboto-slab-bold fs-headline2 mb24 lg:ta-center">Looking for a job?</h3>
Upvotes: 3
Reputation: 91
Found this from the Beautiful Soup documentation. If you are scraping something more specific, try find
and if you are scraping something more general from a
or span
, probably give find_all a try.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
Hope this helps!
Upvotes: 0
Reputation: 2094
With this example maybe is more clear :
from bs4 import BeautifulSoup
import re
html = """
<ul>
<li>First</li>
<li>Second</li>
<li>Third</li>
</ul>
"""
soup = BeautifulSoup(html,'html.parser')
for n in soup.find('li'):
# It Give you one element
print(n)
for n in soup.find_all('li'):
# It Give you all elements
print(n)
Result :
First
<li>First</li>
<li>Second</li>
<li>Third</li>
For more information pls read this https://www.crummy.com/software/BeautifulSoup/bs4/doc/#calling-a-tag-is-like-calling-find-all
Upvotes: 0