YasserKhalil
YasserKhalil

Reputation: 9568

Drop sepcific columns from DataFrame in Pandas

I have been able to scrape a table from a website that requires credentials and I came to the point that I got the DataFrame in weird The code lines as for this part

soup = BeautifulSoup(res_count.content, 'lxml')
df = pd.DataFrame(soup.select('#ctl00_ContentPlaceHolder1_GridView2 tr'))

print(df)

the output is surrounded with a lot of brackets

    0                                                  1          2  \
0  \n                                            [[[ ]]]  [[[كود]]]   
1  \n  [[[<input onclick="javascript:__doPostBack('ct...    [[[1]]]   
2  \n  [[[<input onclick="javascript:__doPostBack('ct...    [[[2]]]   
3  \n  [[[<input onclick="javascript:__doPostBack('ct...    [[[3]]]   
4  \n  [[[<input onclick="javascript:__doPostBack('ct...    [[[4]]]   
5  \n  [[[<input onclick="javascript:__doPostBack('ct...    [[[5]]]   
6  \n  [[[<input onclick="javascript:__doPostBack('ct...    [[[6]]]   

                    3                    4   5  
0  [[[الصف الدراسى]]]  [[[العــــــــدد]]]  \n  
1    [[[الصف الأول]]]             [[[66]]]  \n  
2   [[[الصف الثانى]]]             [[[69]]]  \n  
3   [[[الصف الثالث]]]             [[[67]]]  \n  
4   [[[الصف الرابع]]]             [[[59]]]  \n  
5   [[[الصف الخامس]]]             [[[51]]]  \n  
6   [[[الصف السادس]]]             [[[52]]]  \n  

How can I drop the first two columns and as for the rest of columns I need to get the text without all those brackets [[[....]]]

** I tried the following lines

rows = soup.select('#ctl00_ContentPlaceHolder1_GridView2 tr')
print(rows)

and I got the result like that

[<tr bgcolor="#B9C989">
<th scope="col"><font color="Navy" face="Arial"><b> </b></font></th><th scope="col"><font color="Navy" face="Arial"><b>كود</b></font></th><th scope="col"><font color="Navy" face="Arial"><b>الصف الدراسى</b></font></th><th scope="col"><font color="Navy" face="Arial"><b>العــــــــدد</b></font></th>
</tr>, <tr bgcolor="#DCE0D0">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$0')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>1</b></font></td><td><font color="#333333"><b>الصف الأول</b></font></td><td><font color="#333333"><b>66</b></font></td>
</tr>, <tr bgcolor="White">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$1')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>2</b></font></td><td><font color="#333333"><b>الصف الثانى</b></font></td><td><font color="#333333"><b>69</b></font></td>
</tr>, <tr bgcolor="#DCE0D0">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$2')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>3</b></font></td><td><font color="#333333"><b>الصف الثالث</b></font></td><td><font color="#333333"><b>67</b></font></td>
</tr>, <tr bgcolor="White">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$3')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>4</b></font></td><td><font color="#333333"><b>الصف الرابع</b></font></td><td><font color="#333333"><b>59</b></font></td>
</tr>, <tr bgcolor="#DCE0D0">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$4')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>5</b></font></td><td><font color="#333333"><b>الصف الخامس</b></font></td><td><font color="#333333"><b>51</b></font></td>
</tr>, <tr bgcolor="White">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$5')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>6</b></font></td><td><font color="#333333"><b>الصف السادس</b></font></td><td><font color="#333333"><b>52</b></font></td>
</tr>]

Upvotes: 0

Views: 36

Answers (2)

YasserKhalil
YasserKhalil

Reputation: 9568

Thanks a lot. I have searched a lot till I found a suitable solution that I could modify to get what I need

soup = BeautifulSoup(res_count.content, 'lxml')
table = soup.find_all('table', id='ctl00_ContentPlaceHolder1_GridView2')
df = pd.read_html(str(table))[0]
df = df.drop([df.columns[0]] ,  axis='columns')
print(df)

Upvotes: 0

Daweo
Daweo

Reputation: 36838

I need to get the text without all those brackets [[[....]]]

You might use str.strip for that. Consider following example:

import pandas as pd
df = pd.DataFrame({'A':['[[[1]]]','[[[2]]]','[[[3]]]']})
df['A'] = df['A'].str.strip('[]')
print(df)

Output:

   A
0  1
1  2
2  3

Upvotes: 2

Related Questions