Reputation: 29
I am trying to get a product name and its price from one local website, for this I am using Beautiful Soup. My code:
productlinks = []
for x in range(1,3):
r = requests.get(F'https://www.mechta.kz/section/stiralnye-mashiny/?arrFilter5_pf%5BNEW%5D=&arrFilter5_pf%5BARFP%5D=43843%2C43848&arrFilter5_pf%5BPROMOCODE_PROCENT%5D%5BLEFT%5D=&arrFilter5_pf%5BPROMOCODE_PROCENT%5D%5BRIGHT%5D=&arrFilter5_pf%5BMINPRICE_s1%5D%5BLEFT%5D=38990&arrFilter5_pf%5BMINPRICE_s1%5D%5BRIGHT%5D=1171000&set_filter=Y&PAGEN_2={x}')
soup = BeautifulSoup(r.content, 'lxml')
productlist = soup.find_all('div', class_='aa_st_img iprel')
for item in productlist:
for link in item.find_all('a', href=True):
productlinks.append(baseurl + link['href'])
The code works good, however It does not get all products from the website, it skips some products (no links to the products)
Could you please suggest a solution for this problem
Thanks!
Upvotes: 0
Views: 295
Reputation: 318
You can try other product URL sourcing options as per the schema below. In your specific case Mechta has sitemap index - fetch those and parse XML.
Upvotes: 0
Reputation: 10414
It looks like according to the link that the class j_product_link
has all the links, therefor we can find all tags with class j_product_link
.
e.g.
soup.find_all('a', class_='j_product_link')
possible solution
for x in range(1,3):
r = requests.get(F'https://www.mechta.kz/section/stiralnye-mashiny/?arrFilter5_pf%5BNEW%5D=&arrFilter5_pf%5BARFP%5D=43843%2C43848&arrFilter5_pf%5BPROMOCODE_PROCENT%5D%5BLEFT%5D=&arrFilter5_pf%5BPROMOCODE_PROCENT%5D%5BRIGHT%5D=&arrFilter5_pf%5BMINPRICE_s1%5D%5BLEFT%5D=38990&arrFilter5_pf%5BMINPRICE_s1%5D%5BRIGHT%5D=1171000&set_filter=Y&PAGEN_2={x}')
soup = BeautifulSoup(r.content, 'lxml')
productlist = soup.find_all('a', class_='j_product_link')
for link in productlist:
productlinks.append(baseurl + link['href'])
Upvotes: 1