Append loop in Panda/Python without duplicating header row?

Question

So right now when I run this, I get a final output that includes two header columns. As a result it won't let me write this to a .csv either. How would I fix this so that it only includes the column from the first table? (seeing the rest of the column names are the same throughout)

import pandas as pd
import urllib.request
import bs4 as bs

urls = ['https://fantasysportsdaily.net/bsl/boxes/1-1.html',
        'https://fantasysportsdaily.net/bsl/boxes/1-2.html'
        ]
final = []
for url in urls:

    df = pd.read_html(url, header=0)


    format1 = df[1].iloc[:, : 16]
    colname1 = format1.columns[0]
    format1.insert(1, 'Team', colname1)
    format1.rename(columns = {list(format1)[0]: 'Player'}, inplace = True)
    format2 = format1.drop(format1[format1.Player == 'TEAM TOTALS'].index)
    team1 = format2.drop(format2[format2.Player == 'PERCENTAGES'].index)

    format3 = df[2].iloc[:, : 16]
    colname2 = format3.columns[0]
    format3.insert(1, 'Team', colname2)
    format3.rename(columns = {list(format3)[0]: 'Player'}, inplace = True)
    format4 = format3.drop(format3[format3.Player == 'TEAM TOTALS'].index)
    team2 = format4.drop(format4[format4.Player == 'PERCENTAGES'].index)

    both_teams = [team1, team2]

    combined = pd.concat(both_teams)

    final.append(combined, ignore_index=True)

print(final)
##final.to_csv ('boxes.csv', index = True, header=True)

αԋɱҽԃ αмєяιcαη · Accepted Answer

Please Pay attention to the following points.

since you are calling the same host so you've to use the same session to avoid getting blocked or consider your requests as DDOS attack since pd.read_html is using requests underneath with a different session on each request. so that's better to use one session for the same host. That's why I've used requests.Session() ref
Please try to follow The DRY Principle as you don't need to repeat your code! use a Function or Class as I've used within the code.
Finally, iloc[] is actually can drop columns and rows as well! so you don't need to circle yourself.

import requests
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:87.0) Gecko/20100101 Firefox/87.0'
}


def myformat(content, key):
    df = pd.read_html(content)[key].iloc[:-2, :-1]
    df.insert(1, 'Team', df.columns[0])
    df.rename(columns={df.columns[0]: "Player"}, inplace=True)
    return df


def main(url):
    with requests.Session() as req:
        req.headers.update(headers)
        allin = []
        for num in range(1, 3):
            r = req.get(url.format(num))
            df1 = myformat(r.content, 1)
            df2 = myformat(r.content, 2)
            final = pd.concat([df1, df2], ignore_index=True)
            allin.append(final)
        target = pd.concat(allin, ignore_index=True)
        print(target)


main('https://fantasysportsdaily.net/bsl/boxes/1-{}.html')

Output:

              Player          Team POS MIN  FG FGA 3P  ... REB    A   PF   ST   TO   BL   PTS
0     David Robinson         Spurs   C  32   6  14  0  ...  15  1.0  4.0  2.0  3.0  2.0  21.0
1      Reggie Miller         Spurs  SG  42   5  12  3  ...   6  3.0  0.0  2.0  3.0  0.0  17.0
2      Tom Gugliotta         Spurs  PF  25   6   7  0  ...  11  1.0  4.0  2.0  3.0  1.0  17.0
3      Allan Houston         Spurs  PG  27   5  19  2  ...   1  1.0  1.0  2.0  3.0  0.0  12.0
4       Sean Elliott         Spurs  SF  34   3   6  0  ...   4  0.0  3.0  2.0  2.0  0.0   7.0
5          Rik Smits         Spurs  PF  32   1  10  0  ...   9  1.0  4.0  0.0  1.0  0.0   6.0
6       Mark Jackson         Spurs  PG  21   3   9  0  ...   3  7.0  1.0  2.0  6.0  0.0   6.0
7        Will Perdue         Spurs   C  16   1   3  0  ...   1  1.0  1.0  1.0  0.0  1.0   4.0
8        Robert Pack         Spurs  SG  12   0   2  0  ...   0  1.0  1.0  0.0  0.0  0.0   0.0
9        John Starks        Lakers  SG  39  10  20  2  ...   7  1.0  2.0  2.0  4.0  0.0  27.0
10     Magic Johnson        Lakers  PG  36   7  10  1  ...   7  7.0  1.0  1.0  2.0  0.0  20.0
11       Eddie Jones        Lakers  SF  31   4   7  1  ...   5  3.0  0.0  2.0  2.0  0.0  12.0
12    Elden Campbell        Lakers  PF  24   5  10  0  ...   5  0.0  4.0  0.0  0.0  1.0  12.0
13   Cedric Ceballos        Lakers  PF  32   3  11  0  ...  11  3.0  6.0  4.0  7.0  0.0  10.0
14       Vlade Divac        Lakers   C  24   3   6  0  ...   9  1.0  5.0  1.0  1.0  1.0   6.0
15    Pervis Ellison        Lakers   C  18   3   4  0  ...   4  0.0  6.0  1.0  0.0  1.0   6.0
16     Nick Van Exel        Lakers  PG  17   3   7  0  ...   1  3.0  0.0  0.0  1.0  0.0   6.0
17      Corie Blount        Lakers   C   6   0   0  0  ...   4  0.0  1.0  1.0  1.0  1.0   4.0
18    Anthony Peeler        Lakers  SF  13   0   4  0  ...   1  1.0  0.0  0.0  2.0  0.0   0.0
19      Terry Porter  Timberwolves  PG  31   6  15  2  ...   4  1.0  2.0  1.0  4.0  0.0  16.0
20      Kendall Gill  Timberwolves  PG  26   6  10  1  ...   5  5.0  0.0  0.0  3.0  0.0  15.0
21        J.R. Rider  Timberwolves  SG  34   7  14  0  ...   5  4.0  4.0  0.0  6.0  1.0  14.0
22     Larry Johnson  Timberwolves  SF  31   3  13  0  ...  10  3.0  1.0  0.0  1.0  0.0   8.0
23    LaPhonso Ellis  Timberwolves  PF  30   1  13  0  ...  15  2.0  3.0  0.0  1.0  1.0   6.0
24         J.R. Reid  Timberwolves  PF  18   1   4  0  ...   3  0.0  1.0  2.0  3.0  0.0   4.0
25        Mark Davis  Timberwolves  SF  17   1   3  0  ...   3  0.0  0.0  0.0  1.0  0.0   2.0
26        Eric Riley  Timberwolves   C  13   1   2  0  ...   5  0.0  0.0  1.0  1.0  0.0   2.0
27     Kevin Garnett  Timberwolves   C  35   0   8  0  ...   9  2.0  2.0  2.0  0.0  2.0   1.0
28  Micheal Williams  Timberwolves  PG   5   0   2  0  ...   2  1.0  1.0  0.0  0.0  0.0   0.0
29     Jim McIlvaine       Bullets   C  30   5   8  0  ...   6  1.0  2.0  0.0  1.0  6.0  16.0
30    Ledell Eackles       Bullets  SG  30   6   9  2  ...   9  1.0  2.0  3.0  2.0  0.0  15.0
31      Juwan Howard       Bullets  PF  29   4  10  0  ...   6  1.0  2.0  1.0  1.0  0.0  15.0
32     Avery Johnson       Bullets  PG  35   6  16  0  ...   2  6.0  2.0  2.0  2.0  0.0  14.0
33        Tim Legler       Bullets  SF  28   5  13  0  ...   2  0.0  0.0  1.0  1.0  0.0  10.0
34      David Benoit       Bullets   C  18   2   8  1  ...  10  1.0  1.0  1.0  1.0  0.0   7.0
35       Brent Price       Bullets  SG  18   2   6  1  ...   2  2.0  1.0  1.0  0.0  0.0   5.0
36   Rasheed Wallace       Bullets  SF  22   2   6  0  ...   1  2.0  1.0  1.0  0.0  0.0   4.0
37    Cory Alexander       Bullets  PG   9   0   2  0  ...   2  2.0  0.0  0.0  3.0  0.0   1.0
38   Mitchell Butler       Bullets  PF  19   0   1  0  ...   6  0.0  1.0  0.0  0.0  0.0   0.0

[39 rows x 17 columns]

Append loop in Panda/Python without duplicating header row?

Answers (2)

Related Questions