Reputation: 12529
I have a list of URLs stored in a variable href
. When I pass it through the below function, the only returned recipe_links come from the first URL in href
. Are there any glaring errors with my code? I'm not sure why it wouldn't loop through all 20 URLs I have stored in href
. The returned results that I get for the first URL in href
are retrieved as expected, but I can't get the loop to the next URL.
def first_page_links(link):
recipe_links = []
recipe_html = []
for x in link:
page_request = requests.get(x)
recipe_html.append(html.fromstring(page_request.text))
print recipe_html
for x in recipe_html:
recipe_links.append(x.xpath('//*[@id="content"]/ul/li/a/@href'))
return recipe_links
Upvotes: 2
Views: 163
Reputation: 14179
Try pushing out your second loop and your return
line so that no redundant iteration happens and the final list is properly returned, something like the following:
from lxml import html
import requests as rq
def first_page_links(links):
recipe_links = []
recipe_html = []
for link in links:
r = rq.get(link)
recipe_html.append(html.fromstring(r.text))
for rhtml in recipe_html:
recipe_links.append(rhtml.xpath('//*[@id="content"]/ul/li/a/@href'))
return recipe_links
Let us know if this works.
EDIT:
Consider the following:
y_list = []
final_list = []
for x in x_list:
y_list.append(x)
for y in y_list:
final_list.append(y)
This is your function, simplified. Assuming in x_list
you have 3 URLs, what happens is the following:
x1
is appended to y_list
.y_list
is processed with only x1
so far, so x1
alone is appended to final_list
. final_list
now contains: [x1]
x2
is appended to y_list
.y_list
now contains x1
and x2
. Both are processed and appended to final_list
. final_list
now contains: [x1, x1, x2]
.x3
is appended to y_list
. y_list
now contains x1
, x2
, and x3
.Since your second loop, which processes the items in the first list, is inside the first loop, which adds incrementally to the first list, the second loop will process your first list on every iteration of the first loop. This makes it a redundant iteration.
There are many ways to execute what you wanted to do, but if you're just appending to lists and need a one-pass loop on both, the above fix was all that's needed.
Upvotes: 2
Reputation: 473903
Watch where the return
is placed. You probably want to return after all the loops are finished:
def first_page_links(link):
recipe_links = []
recipe_html = []
for x in link:
page_request = requests.get(x)
recipe_html.append(html.fromstring(page_request.text))
print recipe_html
for x in recipe_html:
recipe_links.append(x.xpath('//*[@id="content"]/ul/li/a/@href'))
return recipe_links
Upvotes: 4