Reputation: 569
I have scraped data but need help parsing it correctly. I am still learning and will appreciate any advice I can get.
I am looking for the data for the following two variables: TEAM, SA/G
Here is my code so far:
#import modules
from selenium import webdriver
from bs4 import BeautifulSoup
#set path for driver
driver = webdriver.Chrome('C:\webdrivers\chromedriver.exe')
# open page
driver.get('http://www.espn.com/nhl/statistics/team/_/stat/scoring/sort/avgGoals')
# driver.page_source
soup = BeautifulSoup(driver.page_source,'lxml')
#close driver
driver.close()
#grab table data
table = soup.find(class_='tablehead')
#parse data (extra data included)
for t in table:
td_tags = table.find_all('td')
# print(td_tags)
for td in td_tags:
a_tags = table.find('a')
print(td.text)
I have scraped the correct data but there is extra info that I could use help parsing. Any suggestion on how I can just get the TEAM and SA/G data?
Here is an example of the Pandas DataFrame output I am looking for:
Team SA/G
Nashville 30.1
Colorado 33.6
Washington 31.0
Thanks in advance for any help that you may offer!
CODE UPDATE:
The 1st attempt grabbed only the Team info and had extra data ("GP", for example).
1st attempt at fixing code:
# parse data (closer to desired output but missing SA/G data)
for tab in table:
tr = table.find_all('tr')
for t in tr:
td = table.find_all('td')
print((t.a.text))
The 2nd attempt grabbed both the Team data and SA/G but had extra data too ("TEAM" and "SA/G" text every 11 line of code, for example).
Here is the 2nd attempt:
#parses TEAM and SA/G
import pandas as pd
x = pd.read_html("http://www.espn.com/nhl/statistics/team/_/stat/scoring/sort/avgGoals")[0]
print(x[[1, 9]])
Upvotes: 1
Views: 114
Reputation: 556
If you want to read a table from a url
, I would use the method read_html
from pandas. Underneath, Pandas uses bs4
for parsing the web page for you. You can see an example of this below:
In [3]: import pandas as pd
In [4]: pd.read_html("http://www.espn.com/nhl/statistics/team/_/stat/scoring/sort/avgGoals")[0]
Out[4]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 RK TEAM GP G GA GF/G GA/G DIFF SF/G SA/G DIFF SVPCT PIM PIMA DIFF
1 1 Nashville 11 45 33 4.09 3.00 1.09 31.9 30.1 01.8 .900 87 109 -22
2 2 Colorado 11 44 30 4.00 2.73 1.27 31.4 33.6 -02.3 .919 102 140 -38
3 3 Washington 13 49 43 3.77 3.31 0.46 30.3 31.0 -00.7 .893 125 111 14
4 4 Vancouver 11 40 26 3.64 2.36 1.27 32.6 31.3 01.4 .924 103 119 -16
5 NaN Montreal 11 40 35 3.64 3.18 0.45 34.4 31.1 03.3 .898 77 83 -6
6 6 Toronto 13 46 44 3.54 3.38 0.15 32.7 32.8 -00.1 .897 88 82 6
7 7 Florida 12 42 45 3.50 3.75 -0.25 34.0 30.0 04.0 .875 78 86 -8
8 NaN Philadelphia 10 35 30 3.50 3.00 0.50 35.4 27.4 08.0 .891 78 90 -12
9 9 Buffalo 13 43 32 3.31 2.46 0.85 30.2 33.5 -03.2 .926 100 118 -18
10 10 Tampa Bay 10 33 32 3.30 3.20 0.10 31.4 34.5 -03.1 .907 100 88 12
11 RK TEAM GP G GA GF/G GA/G DIFF SF/G SA/G DIFF SVPCT PIM PIMA DIFF
12 11 Boston 11 36 23 3.27 2.09 1.18 33.3 31.5 01.7 .934 82 80 2
13 NaN Carolina 11 36 29 3.27 2.64 0.64 32.9 29.4 03.5 .910 97 87 10
14 13 Pittsburgh 12 39 30 3.25 2.50 0.75 31.9 29.8 02.1 .916 82 84 -2
15 14 NY Rangers 9 29 34 3.22 3.78 -0.56 28.2 36.9 -08.7 .898 90 82 8
16 15 St. Louis 12 37 38 3.08 3.17 -0.08 29.0 30.3 -01.3 .895 87 91 -4
17 16 Vegas 13 40 36 3.08 2.77 0.31 35.3 32.7 02.6 .915 143 143 0
18 17 Edmonton 12 36 32 3.00 2.67 0.33 27.9 30.6 -02.7 .913 80 74 6
19 NaN Arizona 11 33 24 3.00 2.18 0.82 31.5 29.8 01.6 .927 68 74 -6
20 NaN NY Islanders 11 33 27 3.00 2.45 0.55 27.6 31.5 -03.8 .922 95 67 28
21 20 Columbus 11 30 39 2.73 3.55 -0.82 33.6 31.1 02.5 .886 75 81 -6
22 RK TEAM GP G GA GF/G GA/G DIFF SF/G SA/G DIFF SVPCT PIM PIMA DIFF
23 21 Ottawa 11 29 36 2.64 3.27 -0.64 31.1 35.0 -03.9 .906 134 110 24
24 22 Calgary 13 34 39 2.62 3.00 -0.38 30.9 31.2 -00.3 .904 147 122 25
25 23 San Jose 12 31 43 2.58 3.58 -1.00 28.3 31.8 -03.4 .887 128 124 4
26 NaN Los Angeles 12 31 49 2.58 4.08 -1.50 37.3 28.3 08.9 .856 102 116 -14
27 25 Winnipeg 12 30 37 2.50 3.08 -0.58 33.2 33.3 -00.1 .907 52 88 -36
28 NaN Chicago 10 25 30 2.50 3.00 -0.50 31.6 32.9 -01.3 .909 66 68 -2
29 27 Anaheim 13 32 31 2.46 2.38 0.08 27.5 31.5 -04.0 .924 131 99 32
30 28 New Jersey 9 22 34 2.44 3.78 -1.33 29.3 29.0 00.3 .870 99 93 6
31 29 Minnesota 11 26 37 2.36 3.36 -1.00 29.5 30.4 -00.8 .889 87 93 -6
32 30 Detroit 12 27 45 2.25 3.75 -1.50 31.5 33.2 -01.7 .887 105 96 9
33 RK TEAM GP G GA GF/G GA/G DIFF SF/G SA/G DIFF SVPCT PIM PIMA DIFF
34 31 Dallas 13 25 35 1.92 2.69 -0.77 27.8 28.8 -01.1 .907 89 79 10
Upvotes: 1