bhola prasad
bhola prasad

Reputation: 725

How to create an empty dataframe and keep inserting data in it in a loop?

I scraped some table data from a website. The actual table on the website looks like this -

Actual Table

I used beautifulsoup to get this data which in this format -

[<td class="TableHeadingLeft" width="175">
                        Team
                    </td>,
 <td class="TableHeadingRight" width="35">
                        Mat
                    </td>,
 <td class="TableHeadingRight" width="35">
                        Won
                    </td>,
 <td class="TableHeadingRight" width="35">
                        Lost
                    </td>,
 <td class="TableHeadingRight" width="35">
                        Tied
                    </td>,
 <td class="TableHeadingRight" width="35">
                        N/R
                    </td>,
 <td class="TableHeadingRight" width="45">
                        Points
                    </td>,
 <td class="TableHeadingRight" width="55">
                        Net R/R
                    </td>,
 <td class="TableHeadingRight" width="75">
                        For
                    </td>,
 <td class="TableHeadingRight" width="75">
                        Against
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=MIN">Mumbai Indians</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        9       
                    </td>,
 <td align="right">
                        5       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        18      
                    </td>,
 <td align="right">
                        +0.421      
                    </td>,
 <td align="right">
                        2380/275.1      
                    </td>,
 <td align="right">
                        2282/277.2      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=CSK">Chennai Super Kings</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        9       
                    </td>,
 <td align="right">
                        5       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        18      
                    </td>,
 <td align="right">
                        +0.131      
                    </td>,
 <td align="right">
                        2043/274.1      
                    </td>,
 <td align="right">
                        2012/274.5      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=DDV">Delhi Capitals</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        9       
                    </td>,
 <td align="right">
                        5       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        18      
                    </td>,
 <td align="right">
                        +0.044      
                    </td>,
 <td align="right">
                        2207/272.5      
                    </td>,
 <td align="right">
                        2238/278.1      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=SUN">Sunrisers Hyderabad</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        6       
                    </td>,
 <td align="right">
                        8       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        12      
                    </td>,
 <td align="right">
                        +0.577      
                    </td>,
 <td align="right">
                        2288/269.2      
                    </td>,
 <td align="right">
                        2200/277.5      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=KKR">Kolkata Knight Riders</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        6       
                    </td>,
 <td align="right">
                        8       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        12      
                    </td>,
 <td align="right">
                        +0.028      
                    </td>,
 <td align="right">
                        2466/270.4      
                    </td>,
 <td align="right">
                        2419/266.2      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=KXI">Kings XI Punjab</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        6       
                    </td>,
 <td align="right">
                        8       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        12      
                    </td>,
 <td align="right">
                        -0.251      
                    </td>,
 <td align="right">
                        2429/276.3      
                    </td>,
 <td align="right">
                        2503/277.0      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=RRO">Rajasthan Royals</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        5       
                    </td>,
 <td align="right">
                        8       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        1       
                    </td>,
 <td align="right">
                        11      
                    </td>,
 <td align="right">
                        -0.449      
                    </td>,
 <td align="right">
                        2153/257.0      
                    </td>,
 <td align="right">
                        2192/248.2      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=RCB">Royal Challengers Bangalore</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        5       
                    </td>,
 <td align="right">
                        8       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        1       
                    </td>,
 <td align="right">
                        11      
                    </td>,
 <td align="right">
                        -0.607      
                    </td>,
 <td align="right">
                        2146/258.4      
                    </td>,
 <td align="right">
                        2266/254.3      
                    </td>]

Now, I used a loop and some code to extract the data that I need.

for data in table_data.find_all('td'):
    print(''.join(data.text.split()))

The output of this -

Team
Mat
Won
Lost
Tied
N/R
Points
NetR/R
For
Against
MumbaiIndians
14
9
5
0
0
18
+0.421
2380/275.1
2282/277.2
ChennaiSuperKings
14
9
5
0
0
18
+0.131
2043/274.1
2012/274.5
DelhiCapitals
14
9
5
0
0
18
+0.044
2207/272.5
2238/278.1
SunrisersHyderabad
14
6
8
0
0
12
+0.577
2288/269.2
2200/277.5
KolkataKnightRiders
14
6
8
0
0
12
+0.028
2466/270.4
2419/266.2
KingsXIPunjab
14
6
8
0
0
12
-0.251
2429/276.3
2503/277.0
RajasthanRoyals
14
5
8
0
1
11
-0.449
2153/257.0
2192/248.2
RoyalChallengersBangalore
14
5
8
0
1
11
-0.607
2146/258.4
2266/254.3

Created an empty dataframe of the desired shape -

col_name = ['Team','Mat','Won','Lost','Tied','N/R','Points','NetR/R','For','Against']

import pandas as pd 
df = pd.DataFrame(data=None, columns=col_name)

Now, I couldn't able to understand, how to add these data to this dataframe.

Upvotes: 0

Views: 697

Answers (3)

Rathan
Rathan

Reputation: 69

Once you have the table data as 1D list

table_data_list = [''.join(i.text.split() for i in table_data.find_all('td')]

Convert the data into 2D using numpy reshape

df = pd.DataFrame(np.array(table_data_list).reshape(-1,10))

Upvotes: 0

Subbu VidyaSekar
Subbu VidyaSekar

Reputation: 2615

There is a simple way to create a dataframe from list of list.

Code :

ll = []
for data in table_data.find_all('td'):
    ll.append(''.join(data.text.split()))  

n = 10
final = [ll[i:i + n] for i in range(0, len(ll), n)]
col = final [0]
df = pd.DataFrame(final) 
df.columns = df.iloc[0]
df = df.drop(df.index[0])
df

ll is list of list from your table data.

final is split the ll with 10 columns as you mentioned in columns

The last two lines are to make the first row as column name because we just created a dataframe without giving any column names.

Upvotes: 1

Abhilash Awasthi
Abhilash Awasthi

Reputation: 797

Use pd.concat

import pandas as pd 

col_name = ['Team','Mat','Won','Lost','Tied','N/R','Points','NetR/R','For','Against']

# Main dataframe
df = pd.DataFrame(data=None, columns=col_name)

# df1 is the data frame that you want to concat in your original data frame

df = pd.concat([df, df1])

Upvotes: 0

Related Questions