trying to loop through a list of urls and scrape each page for text

Question

I'm having an issue. It loops through the list of URLS, but it's not adding the text content of each page scraped to the presults list.

I haven't gotten to the raw text processing yet. I'll probably make a question for that once I get there if I can't figure out.

What is wrong here? The length of presults remains at 1 even though it seems to be looping through the list of urls for the scrape...

Here's part of the code I'm having an issue with:

counter=0
for xa in range(0,len(qresults)):
        pageURL=qresults[xa].format()
        pageresp= requests.get(pageURL, headers=headers)
        if pageresp.status_code==200:
                print(pageURL)
                psoup=BeautifulSoup(pageresp.content, 'html.parser')
                presults=[]
                para=psoup.text
                presults.append(para)
                print(len(presults))
        else: print("Could not reach domain")
print(len(presults))

Prune · Accepted Answer

Your immediate problem is here:

            presults=[]
            para=psoup.text
            presults.append(para)

On every for iteration, you replace your existing presults list with the empty list and add one item. On the next iteration, you again wipe out the previous result.

Your initialization must be done only once and that before the loop:

presults = []
for xa in range(0,len(qresults)):

trying to loop through a list of urls and scrape each page for text

Answers (2)

Related Questions