Reputation: 31
This is what I have at the moment:
import bs4
import requests
def getXkcdComic(comicUrl):
for i in range(0,20):
res = requests.get(comicUrl + str(1882 - i))
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
img = soup.select_one("div#comic > img")
return str(img['src'])
link = getXkcdComic('https://xkcd.com/')
print(link)
I parses the html, gets one link, the first one, and since I know the url finishes at 1882 and the next I want is 1881, I wrote this for-loop
to get the rest.
It only prints one result as if there was not loop written.
Strangely, if I reduce the indentation for the return
function it returns a different url.
I didn't quite get how For-loops
works yet.
Also, this is my first post ever here so forgive my english and ignorance.
Upvotes: 1
Views: 177
Reputation: 43
When you call 'return' during the first loop the entire 'getXkcdComic' function exits and returns.
Something like this may work and print like the original code intended:
import bs4
import requests
def getXkcdComic(comicUrl, number):
res = requests.get(comicUrl + str(number))
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
return str(soup.select_one("div#comic > img")['src'])
link = 'https://xkcd.com/'
for i in range(20):
print(getXkcdComic(link, 1882-i))
Upvotes: 0
Reputation: 36063
The other answers are good and general, but for this specific case there's an even better way. xkcd provides a JSON API, so you can use a list comprehension:
def getXkcdComic(comicUrl):
return [requests.get(comicUrl + str(1882 - i) + '/info.0.json').json()['img']
for i in range(0,20)]
This is also faster and more friendly to the xkcd servers.
Upvotes: 0
Reputation: 1815
It happened because you make return
in the loop. Try it:
def getXkcdComic(comicUrl):
res = list()
for i in range(0,20):
res = requests.get(comicUrl + str(1882 - i))
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
img = soup.select_one("div#comic > img")
res.append(str(img['src']))
return res
And you can change this:
for i in range(0,20):
res = requests.get(comicUrl + str(1882 - i))
on this:
for i in range(1862, 1883, 1):
res = requests.get(comicUrl + str(i))
Upvotes: 0
Reputation: 78564
Your function returns control to the caller once it encounters the return
statement, here in the first iteration of the for.
You can yield
instead of return
in your function to produce image links successively from the function and keep the for loop running:
import bs4
import requests
def getXkcdComic(comicUrl):
for i in range(0,20):
...
yield img['src'] # <- here
# make a list of links yielded by function
links = list(getXkcdComic('https://xkcd.com/'))
References:
Upvotes: 0
Reputation: 733
How do you expect to get multiple outputs (url here) with a single method call? The for loop helps you iterate over a range multiple times and get multiple results, but its of no use until you have a single call. You can do one of the following:
Do the following:
def getXkcdComic(comicUrl):
for i in range(0,20):
res = requests.get(comicUrl + str(1882 - i))
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
img = soup.select_one("div#comic > img")
print str(img['src'])
getXkcdComic('https://xkcd.com/')
Upvotes: 0
Reputation: 7880
The first time you hit a return
statement, the function is going to return, regardless of whether you're in a loop. So your for()
loop is going to get to the end of the first iteration, see the return
, and that's it. The other 19 iterations won't run.
The reason you get a different URL if you dedent the return
is that your for()
loop can now run to completion. But since you didn't save any of your previous iterations, it will return only the last one.
What it looks like you might want is to build a list of results, and return that.
def getXkcdComic(comicUrl):
images = [] # Create an empty list for results
for i in range(0,20):
res = requests.get(comicUrl + str(1882 - i))
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
img = soup.select_one("div#comic > img")
images.append(str(img['src'])) # Save the result by adding it to the list
return images # Return the list
Just remember then that link
in your outer scope will actually be a list of links, and handle it accordingly.
Upvotes: 3