Jonathan Porter
Jonathan Porter

Reputation: 1556

XPath not finding any results

Using Python 3.4, lxml, and requests to scrape google trends.

In this example, I'm trying to retrieve the text "Johnny Depp" located between these span tags. I'm new to the lxml module and XPath syntax but I'm not sure what I'm doing wrong at this point.

Thank you in advance.

HTML:

<span class="hottrends-single-trend-title ellipsis-maker-inner">Johnny Depp</span>

Code:

from lxml import html
import requests

page = requests.get('https://trends.google.com/trends/hottrends')
tree = html.fromstring(page.content)

#This will create a list of trends:
trends = tree.xpath('//span[@class="hottrends-single-trend-title ellipsis-maker-inner"]/text()')

print('Trends: ', trends)

Results: enter image description here

Upvotes: 0

Views: 1143

Answers (1)

har07
har07

Reputation: 89285

Using the corresponding RSS URL you can use lxml's XML parser or even xml.etree from the standard library since the XML structure is much simpler than the HTML counterpart. Given the RSS XML, you can just iterate through item elements and print the title, for example (though the top result is no longer 'Johnny Depp' now :)) :

>>> from lxml import etree as ET
>>> import requests
>>> page = requests.get('https://trends.google.com/trends/hottrends/atom/feed?pn=p1')
>>> root = ET.fromstring(page.content)
>>> for trend in root.xpath('//item'):
...     print trend.find('title').text
... 
spinner
Old Navy Flip Flop Sale
You Get Me
Johnny Depp
NHL Draft
GLOW
Despicable Me 3
Blake Griffin
Robert Del Naja
DJ Khaled Grateful
Bella Thorne
Tubelight
interstellar
Camila Cabello
Mexico vs Russia
Frank Mason
Bam Adebayo
TJ Leaf
the house
Dwyane Wade

Upvotes: 1

Related Questions