moon17
moon17

Reputation: 11

Iterate through list to download items in python

I previously posted this question asking for help with a python script and didn't get much feedback, which is okay! Because I figured out how to work on most of it myself, but I'm running into some trouble.

My script currently is like this:

param1 = 
param2 = 
param3 = 

requestURL = "http://examplewebpage.com/live2/?target=param1&query=param2&other=param3"

html_content = urllib2.urlopen(requestURL).read()

matches = re.findall('<URL>(.*?)</URL>', html_content);

myList=[matches]

i = 0
while i < len(myList):
    testfile = urllib.URLopener()
    testfile.retrieve(myList[i], "/Users/example/file/location/newtest")
    i += 1

This successfully retrieves all URLs from the web page, but I cannot find a way to proceed to the download process. I am currently receiving the following error: 'list' object has no attribute 'strip'

Can anyone think of a better way to do this? Or is there a different data type I should be using other than a list?

Upvotes: 0

Views: 601

Answers (1)

user94559
user94559

Reputation: 60153

I think the main problem is that myList=[matches] creates a new list with exactly one element in it. That single element is itself a list of matches.

So when you later access myList[0] in your loop, it's actually a list. Hence the error.

Assuming the rest of your code is correct, I think things will probably work if you just switch to myList=matches, but here's a version that uses clearer variable names and a for loop:

requestURL = "http://examplewebpage.com/live2/?target=param1&query=param2&other=param3"

html_content = urllib2.urlopen(requestURL).read()

matches = re.findall('<URL>(.*?)</URL>', html_content);

for url in matches:
    testfile = urllib.URLopener()
    testfile.retrieve(url, "/Users/example/file/location/newtest")

EDIT

Of course, every page is going to be written to the same file, unless URLopener.retrieve does something like automatically rename files?

Upvotes: 1

Related Questions