Lincoln Kupke
Lincoln Kupke

Reputation: 63

Python Beautiful Soup retrieve multiple webpages of info

So I am trying to learn scraping and was wondering how to get multiple webpages of info. I was using it on http://www.cfbstats.com/2014/player/index.html . I want to retrieve all the teams and then go within each teams link, which shows the roster, and then retrieve each players info and within their personal link their stats.

what I have so far is:

import requests
from bs4 import BeautifulSoup

r = requests.get("http://www.cfbstats.com/2014/player/index.html")
r.content
soup = BeautifulSoup(r.content)
links = soup.find_all("a")
for link in links:
   college = link.text
   collegeurl = link.get("http")
   c = requests.get(collegeurl)
   c.content
   campbells = BeautifulSoup(c.content)

Then I am lost from there. I know I have to do a nested for loop in there, but I don't want certain links such as terms and conditions and social networks. Just trying to get player info and then their stats which is linked to their name.

Upvotes: 2

Views: 215

Answers (2)

nickie
nickie

Reputation: 5818

You have to somehow filter the links and limit your for loop to the ones that correspond to teams. Then, you need to do the same to get the links to players. Using Chrome's "Developer tools" (or your browser's equivalent), I suggest that you (right-click) inspect one of the links that are of interest to you, then try to find something that distinguishes it from other links that are not of interest. For instance, you'll find out about the CFBstats page:

  1. All team links are inside <div class="conference">. Furthermore, they all contain the substring "/team/" in the href. So, you can either xpath to a link contained in such a div, or filter the ones with such a substring, or both.

  2. On team pages, player links are in <td class="player-name">.

These two should suffice. If not, you get the gist. Web crawling is an experimental science...

Upvotes: 1

Xiaotian Pei
Xiaotian Pei

Reputation: 3260

not familiar with BeautifulSoup, but certainly you can use regular expression to retrieve the data you want.

Upvotes: 0

Related Questions