Reputation: 107
I wrote a script that parses a webpage and get the amount of links('a' tag) on it:
import urllib
import lxml.html
connection = urllib.urlopen('http://test.com')
dom = lxml.html.fromstring(connection.read())
for link in dom.xpath('//a/@href'):
print link
The output of a script:
./01.html
./52.html
./801.html
http://www.blablabla.com/1.html
#top
How can i convert it to list to count the amount of links? I use link.split() but it got to me:
['./01.html']
['./52.html']
['./801.html']
['http://www.blablabla.com/1.html']
['#top']
But i want to get:
[./01.html, ./52.html, ./801.html, http://www.blablabla.com/1.html, #top]
Thanks!
Upvotes: 2
Views: 102
Reputation: 387557
list(dom.xpath('//a/@href'))
This will take the iterator that dom.xpath
returns and puts every item into a list.
Upvotes: 3
Reputation: 40755
link.split()
tries to split link itself. But you must work with entity that represents all links. In your case: dom.xpath('//a/@href')
.
So this must help you:
links = list(dom.xpath('//a/@href'))
And getting length with a built-in len
function:
print len(links)
Upvotes: 7