Reputation: 487
here is my current code to scrape specific player data from a site:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import pandas as pd
from pandas import ExcelWriter
import lxml
import xlsxwriter
page = requests.get('https://www.futbin.com/players?page=1')
soup = BeautifulSoup(page.content, 'lxml')
pool = soup.find(id='repTb')
pnames = pool.find_all(class_='player_name_players_table')
pprice = pool.find_all(class_='ps4_color font-weight-bold')
prating = pool.select('span[class*="form rating ut20"]')
all_player_names = [name.getText() for name in pnames]
all_prices = [price.getText() for price in pprice]
all_pratings = [rating.getText() for rating in prating]
fut_data = pd.DataFrame(
{
'Player': all_player_names,
'Rating': all_pratings,
'Price': all_prices,
})
writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
fut_data.to_excel(writer,'Futbin')
writer.save()
print(fut_data)
This is working fine for the first page. But I need to go through 609 pages in total and get the data from all pages.
Could you please help me to re-write this code to make that working? I am still new and learning with this project.
Upvotes: 0
Views: 163
Reputation: 8260
You can iterate over all 609
pages, parse each page and at the end save collected data to file.xlsx
:
import requests
from bs4 import BeautifulSoup
import pandas as pd
all_player_names = []
all_pratings = []
all_prices = []
for i in range(1, 610):
page = requests.get('https://www.futbin.com/players?page={}'.format(i))
soup = BeautifulSoup(page.content, 'lxml')
pool = soup.find(id='repTb')
pnames = pool.find_all(class_='player_name_players_table')
pprice = pool.find_all(class_='ps4_color font-weight-bold')
prating = pool.select('span[class*="form rating ut20"]')
all_player_names.extend([name.getText() for name in pnames])
all_prices.extend([price.getText() for price in pprice])
all_pratings.extend([rating.getText() for rating in prating])
fut_data = pd.DataFrame({'Player': all_player_names,
'Rating': all_pratings,
'Price': all_prices})
writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
fut_data.to_excel(writer, 'Futbin')
writer.save()
Upvotes: 1