Reputation: 493
I am trying to make a list of the links that are inside a product page.
I have multiple links through which I want to get the links of the product page.
I am just posting the code for a single link.
r = requests.get("https://funskoolindia.com/products.php?search=9723100")
soup = BeautifulSoup(r.content)
for a_tag in soup.find_all('a', class_='product-bg-panel', href=True):
print('href: ', a_tag['href'])
This is what it should print: https://funskoolindia.com/product_inner_page.php?product_id=1113
Upvotes: 1
Views: 78
Reputation: 43
try this : print('href: ', a_tag.get("href"))
and add features="lxml"
to the BeautifulSoup constructor
Upvotes: 1
Reputation: 195603
The data are loaded dynamically through Javascript from different URL. One solution is using selenium
- that executes Javascript and load links that way.
Other solution is using re
module and parse the data url manually:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://funskoolindia.com/products.php?search=9723100'
data_url = 'https://funskoolindia.com/admin/load_data.php'
data = {'page':'1',
'sort_val':'new',
'product_view_val':'grid',
'show_list':'12',
'brand_id':'',
'checkboxKey': re.findall(r'var checkboxKey = "(.*?)";', requests.get(url).text)[0]}
soup = BeautifulSoup(requests.post(data_url, data=data).text, 'lxml')
for a in soup.select('#list-view .product-bg-panel > a[href]'):
print('https://funskoolindia.com/' + a['href'])
Prints:
https://funskoolindia.com/product_inner_page.php?product_id=1113
Upvotes: 1
Reputation: 71471
The site is dynamic, thus, you can use selenium
from bs4 import BeautifulSoup as soup
from selenium import webdriver
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://funskoolindia.com/products.php?search=9723100')
results = [*{i.a['href'] for i in soup(d.page_source, 'html.parser').find_all('div', {'class':'product-media light-bg'})}]
Output:
['product_inner_page.php?product_id=1113']
Upvotes: 2