Tdakers
Tdakers

Reputation: 43

Web Scraping through multiple Web Addresses

I am trying to iterate through several web pages in a single script. However, it will only pull the data back from the last URL in my list

Here is my current code:

from bs4 import BeautifulSoup # BeautifulSoup is in bs4 package 
import requests

URLS = ['https://sc2replaystats.com/replay/playerStats/11116819/1809336', 'https://sc2replaystats.com/replay/playerStats/11116819/1809336']

for URL in URLS:
  response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html.parser')

tb = soup.find('table', class_='table table-striped table-condensed')
for link in tb.find_all('tr'):
    name = link.find('span')
    if name is not None:
        print(name['title'])

The results are:

Commandcenter
Supplydepot
Barracks
Refinery
Orbitalcommand
Commandcenter
Barracksreactor
Supplydepot
Factory
Refinery
Factorytechlab
Orbitalcommand
Starport
Bunker
Supplydepot
Supplydepot
Starporttechlab
Supplydepot
Barracks
Refinery
Supplydepot
Barracks
Engineeringbay
Refinery
Starportreactor
Factorytechlab
Supplydepot
Barracks
Supplydepot
Supplydepot
Supplydepot
Supplydepot
Supplydepot
Commandcenter
Barrackstechlab
Barracks
Barracks
Engineeringbay
Supplydepot
Barracksreactor
Barracksreactor
Supplydepot
Armory
Supplydepot
Supplydepot
Supplydepot
Orbitalcommand
Factory
Refinery
Refinery
Supplydepot
Factoryreactor
Supplydepot
Commandcenter
Barracks
Barrackstechlab
Planetaryfortress
Supplydepot
Supplydepot

When I am expecting:

Nexus
Pylon
Gateway
Assimilator
Cyberneticscore
Pylon
Assimilator
Nexus
Roboticsfacility
Pylon
Shieldbattery
Gateway
Gateway
Commandcenter
Supplydepot
Barracks
Refinery
Orbitalcommand
Commandcenter
Barracksreactor
Supplydepot
Factory
Refinery
Factorytechlab
Orbitalcommand
Starport
Bunker
Supplydepot
Supplydepot
Starporttechlab
Supplydepot
Barracks
Refinery
Supplydepot
Barracks
Engineeringbay
Refinery
Starportreactor
Factorytechlab
Supplydepot
Barracks
Supplydepot
Supplydepot
Supplydepot
Supplydepot
Supplydepot
Commandcenter
Barrackstechlab
Barracks
Barracks
Engineeringbay
Supplydepot
Barracksreactor
Barracksreactor
Supplydepot
Armory
Supplydepot
Supplydepot
Supplydepot
Orbitalcommand
Factory
Refinery
Refinery
Supplydepot
Factoryreactor
Supplydepot
Commandcenter
Barracks
Barrackstechlab
Planetaryfortress
Supplydepot
Supplydepot

Upvotes: 1

Views: 38

Answers (1)

connorjohnson
connorjohnson

Reputation: 62

To go with what @RomanPerekhrest is saying, Inside the for loop you have,

for URL in URLS:
  response = requests.get(URL) 

this means that your overwriting response each time. To get around this, One way Is to make an array called responses, and append the response to it, like this

responses = []
for URL in URLS:
  response = requests.get(URL) 
  responses.append(response)
for response in responses: 
  soup = BeautifulSoup(response.content, 'html.parser')

  tb = soup.find('table', class_='table table-striped table-condensed')
  for link in tb.find_all('tr'):
    name = link.find('span')
    if name is not None:
        print(name['title'])

Upvotes: 1

Related Questions