Reputation: 1170
I am using python to retrieve various metrics from a website (e.g., likes, twitter shares, etc.) While XPath retrieves text just fine, I'm having trouble with these metrics (text within span).
<span class="pluginCountTextDisconnected">78</span>
Now I need to get that "78", but Python does not return anything when I feed it the XPath.
Here's the XPath, just in case:
//*[@id="u_0_2"]/span[2]
Python code:
from lxml import html
import urllib2
from unicsv import CsvUnicodeReader
req=urllib2.Request("http://www.nu.nl/binnenland/3866370/reddingsbrigade-redt-369-mensen-zomer-.html")
tree = html.fromstring(urllib2.urlopen(req).read())
fb_likes = tree.xpath('//*[@id="u_0_2"]/span[2]')
print fb_likes
Upvotes: 1
Views: 2642
Reputation: 1515
Your span is in the iframe
, so you need be 'inside' the iframe, to get the text (btw, //span[@class='pluginCountTextDisconnected']/text()
is correct way, but you are outside the iframe). So you need to read the src
like:
a = html.fromstring(urllib2.urlopen("http://www.nu.nl/binnenland/3866370/reddingsbrigade-redt-369-mensen-zomer-.htm").read())
iframe = html.fromstring(urllib2.urlopen(a.iframe["src"]).read())
fb_likes = iframe .xpath("//span[@class='pluginCountTextDisconnected']/text()")
sorry, didn't test the code, it's just a general idea.
Update
import urllib2, lxml.html
iframe_asfile = urllib2.urlopen('http://www.facebook.com/plugins/like.php?action=recommend&app_id=&channel=http%3A%2F%2Fstatic.ak.facebook.com%2Fconnect%2Fxd_arbiter%2FZEbdHPQfV3x.js%3Fversion%3D41%23cb%3Df112fd0c7b19666%26domain%3Dwww.nu.nl%26origin%3Dhttp%253A%252F%252Fwww.nu.nl%252Ff62d30922cee5%26relation%3Dparent.parent&href=http%3A%2F%2Fwww.nu.nl%2Fbinnenland%2F3866370%2Freddingsbrigade-redt-369-mensen-zomer-.html&layout=box_count&locale=nl_NL&sdk=joey&send=false&show_faces=true&width=75')
iframe_data = iframe_asfile.read()
iframe_asfile.close()
iframe_html = lxml.html.document_fromstring(iframe_data)
fb_likes = iframe_html.xpath(".//span[@class='pluginCountTextDisconnected']/text()")
print fb_likes[0]
prints 78
Upvotes: 0