Jared McCallister
Jared McCallister

Reputation: 129

Web Scraping to .csv

I have been using the following script to scrape some data from a website and export to .csv file:

import requests
from bs4 import BeautifulSoup
import pandas as pd

res = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/tournament-LCS%20Summer%202020/')

soup = BeautifulSoup(res.text, 'html.parser')

table = soup.find("table", class_="table_list playerslist tablesaw trhover")

columns = [i.get_text(strip=True) for i in table.find("thead").find_all("th")]

data = []

table.find("thead").extract()

for tr in table.find_all("tr"):
    data.append([td.get_text(strip=True) for td in tr.find_all("td")])

df = pd.DataFrame(data, columns=columns)

df.to_csv("S10-NA-AVGs.csv", index=False)

I am having issues with trying this same script trying to collect other data and export to .csv. The website in question is: https://gol.gg/game/stats/25989/page-fullstats/

I understand that the data is laid out differently in the html code and that is where I am a little mixed up in what it is looking for to grab. It seems to be a where the individual fields are stored so I tried to change this line around:

columns = [i.get_text(strip=True) for i in table.find("thead").find_all("th")]

That is where I am receiving the error message:

AttributeError: 'NoneType' object has no attribute 'find'

I tried changing to "th" and "thead" to a few different variations but was unsuccessful.

Upvotes: 2

Views: 1201

Answers (3)

baduker
baduker

Reputation: 20052

How about using pandas to get all the job done, since you already use it?

import requests
import pandas as pd

res = requests.get('https://gol.gg/game/stats/25989/page-fullstats/')

df = pd.read_html(res.text, skiprows=[0])
df = pd.concat(df)
df.to_csv("data.csv", index=False)
print(df)

Output:

[                      Player   Huni Svenskeren  ...  Ryoma Cody Sun    Poome
0                       Role    TOP     JUNGLE  ...    MID      ADC  SUPPORT
1                      Kills      2          0  ...      5        4        2
2                     Deaths      5          6  ...      2        2        1
3                    Assists      3          5  ...     10       12       16
4                        KDA      1        0.8  ...    7.5        8       18
5                         CS    186        136  ...    210      217       27
6        CS in Team's Jungle      4         80  ...      8        8        0
7         CS in Enemy Jungle      0          0  ...      0        6        0
8                        CSM    7.6        5.5  ...    8.6      8.8      1.1
9                      Golds   8723       7059  ...  11074    11275     7255
10                       GPM    355        288  ...    451      459      296
11                     GOLD%  21.9%      17.7%  ...  20.5%    20.8%    13.4%
12              Vision Score     14         24  ...     27       37       52
13              Wards placed      7          7  ...      9        9       34
14           Wards destroyed      4          3  ...      3       10        5
15   Control Wards Purchased      0          6  ...      7        2       10
16                      VSPM   0.57       0.98  ...    1.1     1.51     2.12
17                       WPM   0.29       0.29  ...   0.37     0.37     1.38
18                      VWPM      0       0.24  ...   0.29     0.08     0.41
19                      WCPM   0.16       0.12  ...   0.12     0.41      0.2
20                       VS%     9%      15.4%  ...  15.6%    21.4%    30.1%
21  Total damage to Champion  11637      11069  ...   9516    12053     3669
22           Physical Damage   6533       9367  ...    166    11214      604
23              Magic Damage   5104        395  ...   9340      755     3065
24               True Damage      0       1307  ...     10       84        0
25                       DPM    474        451  ...    388      491      149
26                      DMG%  24.1%      22.9%  ...  17.4%      22%     6.7%
27            K+A Per Minute    0.2        0.2  ...   0.61     0.65     0.73
28                       KP%  83.3%      83.3%  ...  65.2%    69.6%    78.3%
29                Solo kills    NaN        NaN  ...    NaN      NaN      NaN
30              Double kills      0          0  ...      1        2        0
31              Triple kills      0          0  ...      0        0        0
32              Quadra kills      0          0  ...      0        0        0
33               Penta kills      0          0  ...      0        0        0
34                     GD@15  -2492      -1117  ...    -21    -1272     -292
35                    CSD@15     -9        -27  ...    -29       -1       -6
36                    XPD@15  -1149      -1627  ...   -191     -287    -1322
37                   LVLD@15     -1         -1  ...      0        0       -1
38   Damage dealt to turrets      0        883  ...   1557     4582      717
39                Total heal   1010       5737  ...   2600     2343     3120
40     Damage self mitigated  16638      10704  ...  16506     5476    11927
41         Time ccing others     26         16  ...     18       26       11
42        Total damage taken  18869      19320  ...  14264    11844     9137

This gets you a nice .csv file:

enter image description here

Bonus: the code also works with the other URL:

import requests
import pandas as pd

res = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/tournament-LCS%20Summer%202020/')

df = pd.read_html(res.text, skiprows=[0])
df = pd.concat(df)
print(df)

Prints:

        100 Thieves  S10  NA  18  38.9%  ...  33.3  1976  3.0  1.23  1.35
0               CLG  S10 NaN  19  26.3%  ...  32.6  1790  3.3  1.21  1.30
1            Cloud9  S10 NaN  18  72.2%  ...  33.4  1971  3.0  1.12  1.30
2          Dignitas  S10 NaN  19  31.6%  ...  32.7  1590  3.1  1.27  1.33
3     Evil Geniuses  S10 NaN  18  44.4%  ...  32.2  1920  3.3  1.39  1.41
4          FlyQuest  S10 NaN  18  66.7%  ...  32.8  1856  3.3  1.21  1.77
5  Golden Guardians  S10 NaN  18  50.0%  ...  33.8  1992  3.4  1.26  1.53
6         Immortals  S10 NaN  18  22.2%  ...  31.1  1717  3.3  1.35  1.46
7       Team Liquid  S10 NaN  18  83.3%  ...  33.6  1784  3.4  1.24  1.51
8               TSM  S10 NaN  18  66.7%  ...  32.5  1741  3.2  1.33  1.33

Upvotes: 3

LeelaPrasad
LeelaPrasad

Reputation: 466

That's because the "class" attribute in the second webpage is different than the first webpage.

Did you try changing the class name to completestats tablesawwhen you ran the script against the second url?

soup.find() will return None when it doesn't find the element you are asking it to look for in a html page. Documentation of find()

Upvotes: 1

Constantine Ketskalo
Constantine Ketskalo

Reputation: 630

Does page contain those tags at all? You could place a breakpoint at that line where you are getting error and try multiple things in watch area of your IDE and see what you have there and what you don't.

Have you considered scrapy as a way to code your app? It has decent built-in functionality and good tutorials at official website. Besides you can wrap your code in classes and, for instance, create different spider classes for each website with its own logic. This way your code is going to be more readable even for yourself during development, if you devide it in classes and methods instead of just writing it all in one file.

Upvotes: -1

Related Questions