pslover
pslover

Reputation: 72

requests.get() not retrieving correct url in python 2.7

I'm trying to access url and then parse it's contents based on tags. My code:

page = requests.get('https://support.apple.com/downloads/')
self.tree = html.fromstring(page.content)
names = self.tree.xpath("//span[@class='truncate_name']//text()")

Problem: variable page is containing data that of url 'https://support.apple.com/' I'm new to python 2.7. The whole encoding issues in file. I'm using unicode-escape as my default encoding. Encoding on resource at https://support.apple.com/downloads/ is utf-8 whereas encoding of resource at https://support.apple.com/ is variable. Is this has something to do with the problem? Please suggest solution for this.

Upvotes: 1

Views: 150

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

It has nothing to do with encoding , what you are looking for is dynamically created so not in the source you get back. A series of ajax calls populates the data. To get the product names etc.. from the carousel where you see the span.truncate_name in your browser:

params = {"page": "products",
          "locale": "en_US",
          "doctype": "DOWNLOADS",
          }
js = requests.get("https://km.support.apple.com/kb/index", params=params).content

Normally we could call .json() on the response object but in this case we need to use "unicode_escape" then call loads:

from json import loads, dumps
js2 = loads(js.decode("unicode_escape"))
print(js2)

Which gives you a huge dict of data like:

{u'products': [{u'name': u'Servers and Enterprise', u'urlpath': u'serversandenterprise', u'order': u'', u'products': .............

You can see the request in chrome tools:

enter image description here

We leave off callback:ACDow‌​nloadSearch.customCa‌​llBack as we want to get back valid json.

Upvotes: 2

Related Questions