Reputation: 39
I am trying to scrape company names from multiple pages on a site. I am using a for loop to move through each page and find the company name.
### CREATING LOOP TO GO THROUGH PAGES ###
results = [] #variable to store loop results
for i in range (4): #goes through 4 pages (0-3)
url = 'https://clutch.co/it-services/msp?page={}'.format(i) #passes the number inside range through the {}
session = HTMLSession()
resp = session.get(url)
resp.html.render() #RENDERS INCASE ITS JAVASCRIPT SITE
soup = BeautifulSoup(resp.html.html, features='lxml')
print(url) #shows what page you are on as it is looping
agencies = soup.find_all(class_='company-name')
for a in agencies:
text = (a.text)
results.append(text)
print(results)
The results of the code above only display the last element of each page as text.
RESULTS:
https://clutch.co/it-services/msp?page=0
https://clutch.co/it-services/msp?page=1
https://clutch.co/it-services/msp?page=2
https://clutch.co/it-services/msp?page=3
['\nAgency Partner Interactive LLC ', '\nTEAM International ', '\nAstute Technology Management ', '\nWP Tech Support ']
My understanding is that this is because of the nested for loop only displays one element? What would be the proper procedure to get the text of every element on all the pages?
Thanks in advance.
Upvotes: 1
Views: 234
Reputation: 234
This is because the statement where you are appending each entry to the results list is out of the internal for loop.
Try this:
### CREATING LOOP TO GO THROUGH PAGES ###
results = [] #variable to store loop results
for i in range (4): #goes through 4 pages (0-3)
url = 'https://clutch.co/it-services/msp?page={}'.format(i) #passes the number inside range through the {}
session = HTMLSession()
resp = session.get(url)
resp.html.render() #RENDERS INCASE ITS JAVASCRIPT SITE
soup = BeautifulSoup(resp.html.html, features='lxml')
print(url) #shows what page you are on as it is looping
agencies = soup.find_all(class_='company-name')
for a in agencies:
text = (a.text)
results.append(text)
print(results)
Upvotes: 3