zwidny
zwidny

Reputation: 414

python list in Recursion

I want to find all links in a div, for example:

<div>
  <a href="#0"></a>
  <a href="#1"></a>
  <a href="#2"></a>
</div>

So I write a func as follow:

def get_links(div):
    links = []
    if div.tag == 'a':
        links.append(div)
        return links   
    else:
        for a in div:
            links + get_links(a)
        return links

why the results is [] rather than [a, a, a]? ------- question

I know this is a question of list reference, could you show some detail

This is the complete module:

import lxml.html


def get_links(div):
    links = []
    if div.tag == 'a':
        links.append(div)
        return links   
    else:
        for a in div:
            links + get_links(a)
        return links


if __name__ == '__main__':

    fragment = '''
        <div>
          <a href="#0">1</a>
          <a href="#1">2</a>
          <a href="#2">3</a>
        </div>'''
    fragment = lxml.html.fromstring(fragment)
    links = get_links(fragment)    # <---------------

Upvotes: 3

Views: 99

Answers (3)

Vivek Sable
Vivek Sable

Reputation: 10223

Other option is to use xpath method to get all a tags from div at any level.

Code:

from lxml import etree
root = etree.fromstring(content)
print root.xpath('//div//a')

Output:

[<Element a at 0xb6cef0cc>, <Element a at 0xb6cef0f4>, <Element a at 0xb6cef11c>]

Upvotes: 0

6502
6502

Reputation: 114481

List addition in Python returns a new list obtained from the concatenation of the arugments, doesn't change them:

x = [1, 2, 3, 4]
print(x + [5, 6])  # displays [1, 2, 3, 4, 5, 6]
print(x)           # here x is still [1, 2, 3, 4]

you can use the extend method:

x.extend([5, 6])

or also +=

x += [5, 6]

The latter is IMO a bit "strange" because it's a case in which x=x+y is not the same as x+=y and therefore I prefer to avoid it and make the in-place extension more explicit.

For your code

links = links + get_links(a)

would also be acceptable but remember that it does a different thing: it allocates a new list with the concatenation and then assign the name links to point to it: it doesn't change the original object referenced by links:

x = [1, 2, 3, 4]
y = x
x = x + [5, 6]
print(x)   # displays [1, 2, 3, 4, 5, 6]
print(y)   # displays [1, 2, 3, 4]

but

x = [1, 2, 3, 4]
y = x
x += [5, 6]
print(x)   # displays [1, 2, 3, 4, 5, 6]
print(y)   # displays [1, 2, 3, 4, 5, 6]

Upvotes: 2

Mariusz Jamro
Mariusz Jamro

Reputation: 31653

If tag is not 'a' your code looks like that.

# You create an empty list

links = []
for a in div:
    # You combine <links> with result of get_links() but you do not assign it to anything
    links + get_links(a)
# So you return an empty list   
return links

You should change + with +=:

links += get_links(a)

Or use extend()

links.extend(get_links(a))

Upvotes: 1

Related Questions