Reputation: 488
I am making parser for website [https://edp.by/shop/womens-fragrances/][1] first i got all the links from site to navigate through site
import requests
from bs4 import BeautifulSoup
def get_html(url):
r = requests.get(url,'lxml')
return r.text
url='https://edp.by/'
html=get_html(url)
soup=BeautifulSoup(html, )
x = soup.findAll("div", {"class": "row mainmenu"})
#print(x)
links=[]
for i in x:
z=i.find_all("ul", {"class": "nav navbar-nav"})[0].find_all("a", {"class": "dropdown-toggle"})
print(233,z,len(z),type(z))
for i in z:
q=i["href"]
links.append(url+str(q))
then i am trying to get each product from links:
url='https://edp.by/shop/womens-fragrances/'
html=get_html(url)
soup=BeautifulSoup(html, )
#x = soup.findAll("div", {"class": "row"})
#print()
action = soup.find('form').get('action')
print(action)
and result is : /search/
but at website i see all the structure via google code analizator
<form method="get" action="/shop/womens-fragrances/">
<div class="rr-widget" data-rr-widget-category-id="594" data-rr-widget-id="516e7cba0d422d00402a14b4" data-rr-widget-width="100%"></div>
<div class="shop_block">
<div class="shop_table">
<div class="col-md-4 col-sm-4 col-xs-12 product">
<div class="block">
<a href="/shop/womens-fragrances/43653/">
<img src="/images/no-image.png" class="text-center" alt="" title="">
<p class="fch"></p>
<p class="tch">0,00 руб. </p>
</a>
i want to get link to product, image ,and texts, but bs4 does not shows it. Whats the reason and how could i get it? i tried also mechanicalsoup, no result also
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open(links[0])
form = browser.select_form('form')
action = form.form.attrs['action']
print(action) `/search/`
Upvotes: 0
Views: 268
Reputation: 28650
.find()
will only get the first appearance of that tag. There are 6 elements with the <form>
tag. You can use the .find_all()
, then when you iterate through that, you'll see it's the 3rd index position in that list:
import requests
from bs4 import BeautifulSoup
def get_html(url):
r = requests.get(url,'lxml')
return r.text
url='https://edp.by/'
html=get_html(url)
soup=BeautifulSoup(html, )
x = soup.findAll("div", {"class": "row mainmenu"})
#print(x)
links=[]
for i in x:
z=i.find_all("ul", {"class": "nav navbar-nav"})[0].find_all("a", {"class": "dropdown-toggle"})
print(233,z,len(z),type(z))
for i in z:
q=i["href"]
links.append(url+str(q))
url='https://edp.by/shop/womens-fragrances/'
html=get_html(url)
soup=BeautifulSoup(html, 'html.parser')
#x = soup.findAll("div", {"class": "row"})
#print()
actions = soup.find_all('form')
for action in actions:
alpha = action.get('action')
print (alpha)
Output:
/search/
/filter-ajax/
/filter-ajax/
/shop/womens-fragrances/
/shop/womens-fragrances/?lxml
/users/
Upvotes: 1