Reputation: 17
I am trying to scrape the pricing information from these two websites: site1 and site2 I am using Python and packages BeautifulSoup and requests.
What I realized is that the pricing section is not available in the source code for both sites. So I am wondering how I can scrape the data.
Any advice would be appreciated. Thank you
Upvotes: 0
Views: 1153
Reputation: 474031
The problem is that first you need to select a country to see the prices.
In technical sense, you need to make a POST request to http://www.strem.com/catalog/index.php
to select a country, then you can get the prices:
from bs4 import BeautifulSoup
import requests
URL = "http://www.strem.com/catalog/v/29-6720/17/copper_1300746-79-5"
session = requests.session()
p = session.post("http://www.strem.com/catalog/index.php", {'country': 'USA',
'page_function': 'select_country',
'item_id': '7211',
'group_id': '17'})
response = session.get(URL)
soup = BeautifulSoup(response.content)
print [td.text.strip() for td in soup.find_all('td', class_='price')]
This prints:
[u'US$85.00', u'US$285.00', u'US$1,282.00', u'US$3,333.00']
A more elegant solution would be to submit a form using mechanize
package:
import cookielib
from bs4 import BeautifulSoup
import mechanize
URL = "http://www.strem.com/catalog/v/29-6720/17/copper_1300746-79-5"
browser = mechanize.Browser()
cj = cookielib.LWPCookieJar()
browser.set_cookiejar(cj)
browser.open(URL)
browser.select_form(nr=1)
browser.form['country'] = ['USA']
browser.submit()
data = browser.response().read()
soup = BeautifulSoup(data)
print [td.text.strip() for td in soup.find_all('td', class_='price')]
Prints:
[u'US$85.00', u'US$285.00', u'US$1,282.00', u'US$3,333.00']
Upvotes: 2