Reputation: 1556
Using Python 3.4, lxml, and requests to scrape google trends.
In this example, I'm trying to retrieve the text "Johnny Depp" located between these span tags. I'm new to the lxml module and XPath syntax but I'm not sure what I'm doing wrong at this point.
Thank you in advance.
HTML:
<span class="hottrends-single-trend-title ellipsis-maker-inner">Johnny Depp</span>
Code:
from lxml import html
import requests
page = requests.get('https://trends.google.com/trends/hottrends')
tree = html.fromstring(page.content)
#This will create a list of trends:
trends = tree.xpath('//span[@class="hottrends-single-trend-title ellipsis-maker-inner"]/text()')
print('Trends: ', trends)
Upvotes: 0
Views: 1143
Reputation: 89285
Using the corresponding RSS URL you can use lxml
's XML parser or even xml.etree
from the standard library since the XML structure is much simpler than the HTML counterpart. Given the RSS XML, you can just iterate through item
elements and print the title
, for example (though the top result is no longer 'Johnny Depp' now :)) :
>>> from lxml import etree as ET
>>> import requests
>>> page = requests.get('https://trends.google.com/trends/hottrends/atom/feed?pn=p1')
>>> root = ET.fromstring(page.content)
>>> for trend in root.xpath('//item'):
... print trend.find('title').text
...
spinner
Old Navy Flip Flop Sale
You Get Me
Johnny Depp
NHL Draft
GLOW
Despicable Me 3
Blake Griffin
Robert Del Naja
DJ Khaled Grateful
Bella Thorne
Tubelight
interstellar
Camila Cabello
Mexico vs Russia
Frank Mason
Bam Adebayo
TJ Leaf
the house
Dwyane Wade
Upvotes: 1