Nested For Loops - Beautiful Soup Text

Question

I am trying to scrape company names from multiple pages on a site. I am using a for loop to move through each page and find the company name.

### CREATING LOOP TO GO THROUGH PAGES ###

results = [] #variable to store loop results
for i in range (4): #goes through 4 pages (0-3)
    url = 'https://clutch.co/it-services/msp?page={}'.format(i) #passes the number inside range through the {}
    session = HTMLSession() 
    resp = session.get(url)
    resp.html.render() #RENDERS INCASE ITS JAVASCRIPT SITE
    soup = BeautifulSoup(resp.html.html, features='lxml')
    print(url) #shows what page you are on as it is looping
    agencies = soup.find_all(class_='company-name')
    for a in agencies:
        text = (a.text)
    results.append(text)

print(results)

The results of the code above only display the last element of each page as text.

RESULTS:

https://clutch.co/it-services/msp?page=0
https://clutch.co/it-services/msp?page=1
https://clutch.co/it-services/msp?page=2
https://clutch.co/it-services/msp?page=3
['
Agency Partner Interactive LLC ', '
TEAM International ', '
Astute Technology Management ', '
WP Tech Support ']

My understanding is that this is because of the nested for loop only displays one element? What would be the proper procedure to get the text of every element on all the pages?

Thanks in advance.

ahmadfaraz · Accepted Answer

This is because the statement where you are appending each entry to the results list is out of the internal for loop.

Try this:

### CREATING LOOP TO GO THROUGH PAGES ###

results = [] #variable to store loop results
for i in range (4): #goes through 4 pages (0-3)
    url = 'https://clutch.co/it-services/msp?page={}'.format(i) #passes the number inside range through the {}
    session = HTMLSession() 
    resp = session.get(url)
    resp.html.render() #RENDERS INCASE ITS JAVASCRIPT SITE
    soup = BeautifulSoup(resp.html.html, features='lxml')
    print(url) #shows what page you are on as it is looping
    agencies = soup.find_all(class_='company-name')
    for a in agencies:
        text = (a.text)
        results.append(text)

print(results)

Nested For Loops - Beautiful Soup Text

Answers (1)

Related Questions