Reputation: 21
Can anyone confirm if the "find_all" automatically search within tags? I was expecting "find_all" to pick up everything that has "a". But it actually picks up everything within "<a... < /a>"? Also, the difference between "find_all" and "find"?
from bs4 import BeautifulSoup
import requests
url = "https://boston.craigslist.org/search/sof"
response = requests.get(url)
data = response.text
soup = BeautifulSoup(data,'html.parser')
tags = soup.find_all("a")
Result to
[<a class="appstorebtn" href="https://play.google.com/store/apps/details?id=org.craigslist.CraigslistMobile">
Android
</a>,
<a class="appstorebtn" href="https://apps.apple.com/us/app/craigslist/id1336642410">
iOS
</a>,
<a class="header-logo" href="/" name="logoLink">CL</a>,
<a href="/">boston</a>,
<a href="https://post.craigslist.org/c/bos">post</a>,
<a href="https://accounts.craigslist.org/login/home">account</a>,
<a class="favlink" href="#"><span aria-hidden="true" class="icon icon-star fav"></span><span class="fav-number">0</span><span class="fav-label"> favorites</span></a>,
<a class="to-banish-page-link" href="#">
<span aria-hidden="true" class="icon icon-trash red"></span>
<span class="banished_count">0</span>
<span class="discards-label"> hidden</span>
</a>,
<a class="header-logo" href="/">CL</a>,
Upvotes: 1
Views: 940
Reputation: 25087
find_all()
The find_all()
method looks through a tag’s descendants, retrieves all descendants that match your filters and returns a list containing the result/results.
find() vs find_all()
find()
, if you just want to get the first occurrence that match your filters.find_all()
, if you want to get all occurrences that match your filters.Example - Get all href
from bs4 import BeautifulSoup
import requests
url = "https://boston.craigslist.org/search/sof"
response = requests.get(url)
data = response.text
soup = BeautifulSoup(data,'html.parser')
[a['href'] for a in soup.find_all('a',href=True)]
Output
(you may have to iterate and clean it or customizing your filters above to get only href
that contains http
, ...)
['https://play.google.com/store/apps/details?id=org.craigslist.CraigslistMobile',
'https://apps.apple.com/us/app/craigslist/id1336642410',
'/',
'/',
'https://post.craigslist.org/c/bos',
'https://accounts.craigslist.org/login/home',
'#',
'#',
'/',
'https://accounts.craigslist.org/savesearch/save?URL=https%3A%2F%2Fboston%2Ecraigslist%2Eorg%2Fd%2Fsoftware%2Dqa%2Ddba%2Detc%2Fsearch%2Fsof',
'/d/software-qa-dba-etc/search/sof',
'/d/software-qa-dba-etc/search/sof',
'/d/software-qa-dba-etc/search/sof?sort=date&',
...]
Upvotes: 1