Reputation: 28640
I'm still learning the complexity of using beautiful soup.
I'm trying to create a data frame from http://www.nfl.com/injuries?week=1 where I'd have the player's name, their position, and their game/injury status. I've been trying to adapt code that I've found, but not getting anything or anywhere. Any suggestions on where it's going wrong?
EDIT: After doing some more looking, my original problem was with the tags. Looks like its a <script>
type=javascript/text
. So I changed that. Now I'm getting closer, but not sure how to pull out the relevant data. how do I pull out the {player: " ", position: " ".....} data?
Below is the code with a sample of the what I'm trying to collect.
import bs4
import requests as re
import pandas as pd
alpha = re.get('http://www.nfl.com/injuries?week=1')
beta = bs4.BeautifulSoup(alpha.text,'lxml')
#print(beta)
gama = beta.findAll('script', {'type':"text/javascript"})
print(gama)
sample
</script>, <script type="text/javascript">
nfl.use("node", "datatable", "datatable-sort", "mobile-panel", "overthrow",
"overthrow-shadows", "tabview", function(Y) {
var isTeamAway = false,
isTeamHome = false,
isTeam = false,
homeAbbr = 'DEN',
awayAbbr = 'LAC',
gameWeek = '1',
teamTabHome = Y.one('.colors-DEN-1'),
teamTabAway = Y.one('.colors-LAC-1'),
datatableHome = Y.one('.data-table-DEN-1'),
datatableAway = Y.one('.data-table-LAC-1');
var dataAway = [
{player: "Inman Dontrelle ", position: "WR", injury: "Groin", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Inman", firstName: "Dontrelle", esbId: "INM264861" },
{player: "McGrath Sean ", position: "TE", injury: "Knee", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "McGrath", firstName: "Sean", esbId: "MCG631892" },
{player: "Attaochu Jeremiah ", position: "DE", injury: "Hamstring", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Attaochu", firstName: "Jeremiah", esbId: "ATT290361" },
{player: "Boston Jayestin ", position: "S", injury: "Calf", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Boston", firstName: "Jayestin", esbId: "BOS695248" },
];
var dataHome = [
{player: "Booker Devontae ", position: "RB", injury: "Wrist", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Booker", firstName: "Devontae", esbId: "BOO019902" },
{player: "Talib Aqib ", position: "CB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Talib", firstName: "Aqib", esbId: "TAL428789" },
{player: "Paradis Matthew ", position: "C", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Paradis", firstName: "Matthew", esbId: "PAR002722" },
{player: "Kerr Zachariah ", position: "DT", injury: "Knee", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Kerr", firstName: "Zachariah", esbId: "KER593782" },
{player: "Peko Kyle ", position: "DT", injury: "Foot", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Peko", firstName: "Kyle", esbId: "PEK467819" },
{player: "Dixon Riley ", position: "P", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Dixon", firstName: "Riley", esbId: "DIX641722" },
{player: "Crick Jared ", position: "DE", injury: "Back", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Crick", firstName: "Jared", esbId: "CRI129618" },
{player: "Wolfe Derek ", position: "DE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Wolfe", firstName: "Derek", esbId: "WOL309455" },
{player: "Lynch Paxton ", position: "QB", injury: "right Shoulder", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Lynch", firstName: "Paxton", esbId: "LYN526034" },
{player: "Gotsis Adam ", position: "DE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Gotsis", firstName: "Adam", esbId: "GOT428790" },
{player: "Thomas Demaryius ", position: "WR", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Thomas", firstName: "Demaryius", esbId: "THO095855" },
{player: "Charles Jamaal ", position: "RB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Charles", firstName: "Jamaal", esbId: "CHA561428" },
];
Upvotes: 0
Views: 234
Reputation: 9440
You can use a regular expression (regex) like this:
import bs4
import requests
import pandas as pd
import re
alpha = requests.get('http://www.nfl.com/injuries?week=1')
beta = bs4.BeautifulSoup(alpha.text,'lxml')
gama = beta.findAll('script', {'type':"text/javascript"})
for g in gama:
match = re.search(r'\{player(.*)',g.text)
if match:
print(match.group(0))
Outputs:
{player: "Logan Bennie ", position: "DT", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Logan", firstName: "Bennie", esbId: "LOG113260" },
{player: "Pelon Claudeson ", position: "DE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Pelon", firstName: "Claudeson", esbId: "PEL747520" },
{player: "Pasztor Austin ", position: "T", injury: "Chest", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Pasztor", firstName: "Austin", esbId: "PAS822673" },
{player: "Flacco Joseph ", position: "QB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Flacco", firstName: "Joseph", esbId: "FLA009602" },
{player: "Dupree Alvin ", position: "LB", injury: "Shoulder", practiceStatus: "Did Not Participate In Practice", gameStatus: "Questionable", lastName: "Dupree", firstName: "Alvin", esbId: "DUP507860" },
{player: "Palmer Carson ", position: "QB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Palmer", firstName: "Carson", esbId: "PAL249055" },
{player: "Bortles Robby ", position: "QB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Bortles", firstName: "Robby", esbId: "BOR650964" },
{player: "Cooper Amari ", position: "WR", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Cooper", firstName: "Amari", esbId: "COO487703" },
{player: "Goode Najee ", position: "LB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Goode", firstName: "Najee", esbId: "GOO217526" },
{player: "Rogers Chester ", position: "WR", injury: "Hamstring", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Rogers", firstName: "Chester", esbId: "ROG146742" },
{player: "Vannett Nicholas ", position: "TE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Vannett", firstName: "Nicholas", esbId: "VAN643509" },
{player: "Norris Jared ", position: "LB", injury: "Groin", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Norris", firstName: "Jared", esbId: "NOR463803" },
{player: "Apple Eli ", position: "CB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Apple", firstName: "Eli", esbId: "APP195645" },
{player: "Anthony Stephone ", position: "LB", injury: "Ankle", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Anthony", firstName: "Stephone", esbId: "ANT204590" },
{player: "Inman Dontrelle ", position: "WR", injury: "Groin", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Inman", firstName: "Dontrelle", esbId: "INM264861" },
Note as I imported re I had to change you import of requests as re.
Upvotes: 1