Reputation: 72
I'm trying to access url and then parse it's contents based on tags. My code:
page = requests.get('https://support.apple.com/downloads/')
self.tree = html.fromstring(page.content)
names = self.tree.xpath("//span[@class='truncate_name']//text()")
Problem: variable page is containing data that of url 'https://support.apple.com/'
I'm new to python 2.7. The whole encoding issues in file. I'm using unicode-escape
as my default encoding. Encoding on resource at https://support.apple.com/downloads/
is utf-8
whereas encoding of resource at https://support.apple.com/
is variable. Is this has something to do with the problem? Please suggest solution for this.
Upvotes: 1
Views: 150
Reputation: 180391
It has nothing to do with encoding , what you are looking for is dynamically created so not in the source you get back. A series of ajax calls populates the data. To get the product names etc.. from the carousel where you see the span.truncate_name
in your browser:
params = {"page": "products",
"locale": "en_US",
"doctype": "DOWNLOADS",
}
js = requests.get("https://km.support.apple.com/kb/index", params=params).content
Normally we could call .json() on the response object but in this case we need to use "unicode_escape"
then call loads:
from json import loads, dumps
js2 = loads(js.decode("unicode_escape"))
print(js2)
Which gives you a huge dict of data like:
{u'products': [{u'name': u'Servers and Enterprise', u'urlpath': u'serversandenterprise', u'order': u'', u'products': .............
You can see the request in chrome tools:
We leave off callback:ACDownloadSearch.customCallBack
as we want to get back valid json.
Upvotes: 2