Reputation: 414
I want to find all links in a div, for example:
<div>
<a href="#0"></a>
<a href="#1"></a>
<a href="#2"></a>
</div>
So I write a func as follow:
def get_links(div):
links = []
if div.tag == 'a':
links.append(div)
return links
else:
for a in div:
links + get_links(a)
return links
why the results is [] rather than [a, a, a]? ------- question
I know this is a question of list reference, could you show some detail
This is the complete module:
import lxml.html
def get_links(div):
links = []
if div.tag == 'a':
links.append(div)
return links
else:
for a in div:
links + get_links(a)
return links
if __name__ == '__main__':
fragment = '''
<div>
<a href="#0">1</a>
<a href="#1">2</a>
<a href="#2">3</a>
</div>'''
fragment = lxml.html.fromstring(fragment)
links = get_links(fragment) # <---------------
Upvotes: 3
Views: 99
Reputation: 10223
Other option is to use xpath
method to get all a
tags from div
at any level.
Code:
from lxml import etree
root = etree.fromstring(content)
print root.xpath('//div//a')
Output:
[<Element a at 0xb6cef0cc>, <Element a at 0xb6cef0f4>, <Element a at 0xb6cef11c>]
Upvotes: 0
Reputation: 114481
List addition in Python returns a new list obtained from the concatenation of the arugments, doesn't change them:
x = [1, 2, 3, 4]
print(x + [5, 6]) # displays [1, 2, 3, 4, 5, 6]
print(x) # here x is still [1, 2, 3, 4]
you can use the extend
method:
x.extend([5, 6])
or also +=
x += [5, 6]
The latter is IMO a bit "strange" because it's a case in which x=x+y
is not the same as x+=y
and therefore I prefer to avoid it and make the in-place extension more explicit.
For your code
links = links + get_links(a)
would also be acceptable but remember that it does a different thing: it allocates a new list with the concatenation and then assign the name links
to point to it: it doesn't change the original object referenced by links
:
x = [1, 2, 3, 4]
y = x
x = x + [5, 6]
print(x) # displays [1, 2, 3, 4, 5, 6]
print(y) # displays [1, 2, 3, 4]
but
x = [1, 2, 3, 4]
y = x
x += [5, 6]
print(x) # displays [1, 2, 3, 4, 5, 6]
print(y) # displays [1, 2, 3, 4, 5, 6]
Upvotes: 2
Reputation: 31653
If tag is not 'a' your code looks like that.
# You create an empty list
links = []
for a in div:
# You combine <links> with result of get_links() but you do not assign it to anything
links + get_links(a)
# So you return an empty list
return links
You should change +
with +=
:
links += get_links(a)
Or use extend()
links.extend(get_links(a))
Upvotes: 1