Ryan Dooley
Ryan Dooley

Reputation: 254

How to scrape PHP Ajax using Python?

I'm a beginner with python, I'm trying build a python program that will scrape product descriptions from http://turnpikeshoes.com/shop/TCF00003. Python has many libraries and I'm sure many approaches to achieving my goal. I've done a few successful scrapes using requests however the fields I was looking for are not showing up, Using chromes inspector I found an Ajax POST request.

Here is my code

from lxml import html
import requests

url = 'http://turnpikeshoes.com/shop/TCF00003'
#URL
headers = {'user-agent': 'my-app/0.0.1'}
#Header info sent to server
page = requests.get(url, headers=headers)
#Get response
tree = html.fromstring(page.content)
#Page Content


ShortDsc = tree.xpath('//span[@itemprop="reviewBody"]/text()')

LongDsc = tree.xpath('//li[@class="productLongDescription"]/text()')

print 'ShortDsc:', ShortDsc
print 'LongDsc:', LongDsc

I think I need to send a request directly to admin-ajax.php

Any help is greatly appreciated

Upvotes: 2

Views: 1071

Answers (3)

Manre
Manre

Reputation: 939

This one worked for me.

from selenium import webdriver

ff = webdriver.PhantomJS()
ff.get(url)
ff.find_element_by_xpath("//span[@itemprop='price']").get_attribute("content")

Thanks!

Upvotes: 0

sharpshadow
sharpshadow

Reputation: 1296

The shop performs a POST request to itself during loading with the URL as parameter in the form data. Here is a little script using requests.Session which first opens the shop and then sends the POST request to get the product information. It simulates the steps a browser would do - saving the cookies from the first request - which is probably needed to get the desired response from the POST call.

import requests

product_url = 'http://turnpikeshoes.com/shop/TCF00003'
product_ajax = 'http://turnpikeshoes.com/wp-admin/admin-ajax.php'
data = {'mrQueries[]':'section', 'action':'mrshop_ajax_call', 'url': product_url}

s = requests.Session()
s.get(product_url)
r = s.post(product_ajax, data=data)

print(r.text)

You could also try to get the cookies at the beginning once and using them for all further POST requests. The shops afford to enable scripting to view the product plays in your hands by reducing the actual response size.

Upvotes: 0

user4280261
user4280261

Reputation:

You should try selenium in this case if you want to scrape javascript content:

from selenium import webdriver
import time

driver = webdriver.PhantomJS()
driver.get("http://turnpikeshoes.com/shop/TCF00003")
time.sleep(5)

LongDsc = driver.find_element_by_class_name("productLongDescription").text

print 'LongDsc:', LongDsc

Btw, you should also install PhantomJS as headless browser.

Upvotes: 1

Related Questions