Reputation: 9568
I have been able to scrape a table from a website that requires credentials and I came to the point that I got the DataFrame in weird The code lines as for this part
soup = BeautifulSoup(res_count.content, 'lxml')
df = pd.DataFrame(soup.select('#ctl00_ContentPlaceHolder1_GridView2 tr'))
print(df)
the output is surrounded with a lot of brackets
0 1 2 \
0 \n [[[ ]]] [[[كود]]]
1 \n [[[<input onclick="javascript:__doPostBack('ct... [[[1]]]
2 \n [[[<input onclick="javascript:__doPostBack('ct... [[[2]]]
3 \n [[[<input onclick="javascript:__doPostBack('ct... [[[3]]]
4 \n [[[<input onclick="javascript:__doPostBack('ct... [[[4]]]
5 \n [[[<input onclick="javascript:__doPostBack('ct... [[[5]]]
6 \n [[[<input onclick="javascript:__doPostBack('ct... [[[6]]]
3 4 5
0 [[[الصف الدراسى]]] [[[العــــــــدد]]] \n
1 [[[الصف الأول]]] [[[66]]] \n
2 [[[الصف الثانى]]] [[[69]]] \n
3 [[[الصف الثالث]]] [[[67]]] \n
4 [[[الصف الرابع]]] [[[59]]] \n
5 [[[الصف الخامس]]] [[[51]]] \n
6 [[[الصف السادس]]] [[[52]]] \n
How can I drop the first two columns and as for the rest of columns I need to get the text without all those brackets [[[....]]]
** I tried the following lines
rows = soup.select('#ctl00_ContentPlaceHolder1_GridView2 tr')
print(rows)
and I got the result like that
[<tr bgcolor="#B9C989">
<th scope="col"><font color="Navy" face="Arial"><b> </b></font></th><th scope="col"><font color="Navy" face="Arial"><b>كود</b></font></th><th scope="col"><font color="Navy" face="Arial"><b>الصف الدراسى</b></font></th><th scope="col"><font color="Navy" face="Arial"><b>العــــــــدد</b></font></th>
</tr>, <tr bgcolor="#DCE0D0">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$0')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>1</b></font></td><td><font color="#333333"><b>الصف الأول</b></font></td><td><font color="#333333"><b>66</b></font></td>
</tr>, <tr bgcolor="White">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$1')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>2</b></font></td><td><font color="#333333"><b>الصف الثانى</b></font></td><td><font color="#333333"><b>69</b></font></td>
</tr>, <tr bgcolor="#DCE0D0">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$2')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>3</b></font></td><td><font color="#333333"><b>الصف الثالث</b></font></td><td><font color="#333333"><b>67</b></font></td>
</tr>, <tr bgcolor="White">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$3')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>4</b></font></td><td><font color="#333333"><b>الصف الرابع</b></font></td><td><font color="#333333"><b>59</b></font></td>
</tr>, <tr bgcolor="#DCE0D0">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$4')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>5</b></font></td><td><font color="#333333"><b>الصف الخامس</b></font></td><td><font color="#333333"><b>51</b></font></td>
</tr>, <tr bgcolor="White">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$5')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>6</b></font></td><td><font color="#333333"><b>الصف السادس</b></font></td><td><font color="#333333"><b>52</b></font></td>
</tr>]
Upvotes: 0
Views: 36
Reputation: 9568
Thanks a lot. I have searched a lot till I found a suitable solution that I could modify to get what I need
soup = BeautifulSoup(res_count.content, 'lxml')
table = soup.find_all('table', id='ctl00_ContentPlaceHolder1_GridView2')
df = pd.read_html(str(table))[0]
df = df.drop([df.columns[0]] , axis='columns')
print(df)
Upvotes: 0
Reputation: 36838
I need to get the text without all those brackets [[[....]]]
You might use str.strip
for that. Consider following example:
import pandas as pd
df = pd.DataFrame({'A':['[[[1]]]','[[[2]]]','[[[3]]]']})
df['A'] = df['A'].str.strip('[]')
print(df)
Output:
A
0 1
1 2
2 3
Upvotes: 2