Zlo
Zlo

Reputation: 1170

Retrieving text within span class with XPath

I am using python to retrieve various metrics from a website (e.g., likes, twitter shares, etc.) While XPath retrieves text just fine, I'm having trouble with these metrics (text within span).

<span class="pluginCountTextDisconnected">78</span>

Now I need to get that "78", but Python does not return anything when I feed it the XPath.

Here's the XPath, just in case:

//*[@id="u_0_2"]/span[2]

Python code:

from lxml import html
import urllib2  
from unicsv import CsvUnicodeReader

req=urllib2.Request("http://www.nu.nl/binnenland/3866370/reddingsbrigade-redt-369-mensen-zomer-.html")
tree = html.fromstring(urllib2.urlopen(req).read())
fb_likes = tree.xpath('//*[@id="u_0_2"]/span[2]')
print fb_likes

Upvotes: 1

Views: 2642

Answers (2)

German Petrov
German Petrov

Reputation: 1515

Your span is in the iframe, so you need be 'inside' the iframe, to get the text (btw, //span[@class='pluginCountTextDisconnected']/text() is correct way, but you are outside the iframe). So you need to read the src like:

a = html.fromstring(urllib2.urlopen("http://www.nu.nl/binnenland/3866370/reddingsbrigade-redt-369-mensen-zomer-.htm").read())
iframe = html.fromstring(urllib2.urlopen(a.iframe["src"]).read())
fb_likes = iframe .xpath("//span[@class='pluginCountTextDisconnected']/text()")

sorry, didn't test the code, it's just a general idea.

Update

import urllib2, lxml.html

iframe_asfile = urllib2.urlopen('http://www.facebook.com/plugins/like.php?action=recommend&app_id=&channel=http%3A%2F%2Fstatic.ak.facebook.com%2Fconnect%2Fxd_arbiter%2FZEbdHPQfV3x.js%3Fversion%3D41%23cb%3Df112fd0c7b19666%26domain%3Dwww.nu.nl%26origin%3Dhttp%253A%252F%252Fwww.nu.nl%252Ff62d30922cee5%26relation%3Dparent.parent&href=http%3A%2F%2Fwww.nu.nl%2Fbinnenland%2F3866370%2Freddingsbrigade-redt-369-mensen-zomer-.html&layout=box_count&locale=nl_NL&sdk=joey&send=false&show_faces=true&width=75')
iframe_data = iframe_asfile.read()
iframe_asfile.close()

iframe_html = lxml.html.document_fromstring(iframe_data)

fb_likes = iframe_html.xpath(".//span[@class='pluginCountTextDisconnected']/text()")
print fb_likes[0]

prints 78

Upvotes: 0

chishaku
chishaku

Reputation: 4643

Add /text() to the xpath:

//*[@id="u_0_2"]/span[2]/text()

Upvotes: 1

Related Questions