Reputation: 254
I'm a beginner with python, I'm trying build a python program that will scrape product descriptions from http://turnpikeshoes.com/shop/TCF00003. Python has many libraries and I'm sure many approaches to achieving my goal. I've done a few successful scrapes using requests however the fields I was looking for are not showing up, Using chromes inspector I found an Ajax POST request.
Here is my code
from lxml import html
import requests
url = 'http://turnpikeshoes.com/shop/TCF00003'
#URL
headers = {'user-agent': 'my-app/0.0.1'}
#Header info sent to server
page = requests.get(url, headers=headers)
#Get response
tree = html.fromstring(page.content)
#Page Content
ShortDsc = tree.xpath('//span[@itemprop="reviewBody"]/text()')
LongDsc = tree.xpath('//li[@class="productLongDescription"]/text()')
print 'ShortDsc:', ShortDsc
print 'LongDsc:', LongDsc
I think I need to send a request directly to admin-ajax.php
Any help is greatly appreciated
Upvotes: 2
Views: 1071
Reputation: 939
This one worked for me.
from selenium import webdriver
ff = webdriver.PhantomJS()
ff.get(url)
ff.find_element_by_xpath("//span[@itemprop='price']").get_attribute("content")
Thanks!
Upvotes: 0
Reputation: 1296
The shop performs a POST request to itself during loading with the URL as parameter in the form data. Here is a little script using requests.Session which first opens the shop and then sends the POST request to get the product information. It simulates the steps a browser would do - saving the cookies from the first request - which is probably needed to get the desired response from the POST call.
import requests
product_url = 'http://turnpikeshoes.com/shop/TCF00003'
product_ajax = 'http://turnpikeshoes.com/wp-admin/admin-ajax.php'
data = {'mrQueries[]':'section', 'action':'mrshop_ajax_call', 'url': product_url}
s = requests.Session()
s.get(product_url)
r = s.post(product_ajax, data=data)
print(r.text)
You could also try to get the cookies at the beginning once and using them for all further POST requests. The shops afford to enable scripting to view the product plays in your hands by reducing the actual response size.
Upvotes: 0
Reputation:
You should try selenium in this case if you want to scrape javascript content:
from selenium import webdriver
import time
driver = webdriver.PhantomJS()
driver.get("http://turnpikeshoes.com/shop/TCF00003")
time.sleep(5)
LongDsc = driver.find_element_by_class_name("productLongDescription").text
print 'LongDsc:', LongDsc
Btw, you should also install PhantomJS as headless browser.
Upvotes: 1