Reputation: 1511
I'm trying to scrape the table on this page.
I can see from the browser debugger that the table I want is there in the HTML. e.g. you can see Peptide Name:
I wrote this code to extract this table:
for i in range(1001,1003):
# try:
res = requests.get("https://webs.iiitd.edu.in/raghava/antitbpdb/display.php?details=" + str(i))
soup = BeautifulSoup(res.content, 'html.parser')
table = soup.find_all('table')
print table
But the output that is printed is:
[<table bgcolor="#DAD5BF" border="1" cellpadding="5" width="970"><tr><td align="center">\n\t This page displays user query in tabular form.\n</td></tr>\n</table>, <table width="970px"><tr><td align="center"><br/><font color="black" size="5px">1001 details</font><br/></td></tr></table>]
Can someone explain why the find_all is not finding all of the tables (and specifically the table I want) and how I can fix this?
Upvotes: 1
Views: 68
Reputation: 52665
FYI (If you want to know the root-cause of your issue) target table
has invalid markup:
<table class ="tab" cellpadding= "5" ... STYLE="border-spacing: 0px;border-style: line ;
<tr bgcolor="#DAD5BF"></tr>
Note that starting tag is not closed: <table ...
(should be <table ...>
) and also ancestor is <div>
while the closing tag is </p>
That's why BeautifulSoup doesn't recognize this as a table
and thus it's not returned by soup.find_all('table')
However, modern browsers has built-in tools to "fix" broken tags and so in browser table
doesn't look "broken": closing </div>
is added to ancestor div
while p
tag transformed into empty node <p></p>
Upvotes: 0
Reputation: 28564
Not sure why it's not showing.
Since it's a table too, I just went ahead and used Pandas to do .read_html
import pandas as pd
url = 'https://webs.iiitd.edu.in/raghava/antitbpdb/display.php?details=antitb_1001'
tables = pd.read_html(url)
table = tables[-1]
Output:
print (table)
0 1
0 Primary information NaN
1 ID antitb_1001
2 Peptide Name Polydim-I
3 Sequence AVAGEKLWLLPHLLKMLLTPTP
4 N-terminal Modification Free
5 C-terminal Modification Free
6 Chemical Modification None
7 Linear/ Cyclic Linear
8 Length 22
9 Chirality L
10 Nature Amphipathic
11 Source Natural
12 Origin Isolated from the venom of the Neotropical was...
13 Species Mycobacterium abscessus subsp. massiliense
14 Strain Mycobacterium abscessus subsp. massiliense iso...
15 Inhibition Concentartion MIC = 60.8 μg/mL
16 In vitro/In vivo Both
17 Cell Line Peritoneal macrophages, J774 macrophages cells...
18 Inhibition Concentartion Treatment of infected macrophages with 7.6 μg...
19 Cytotoxicity Non-cytotoxic, 10% cytotoxicity on J774 cells ...
20 In vivo Model 6 to 8 weeks old BALB/c and IFN-γKO (Knockout...
21 Lethal Dose 2 mg/kg/mLW shows 90% reduction in bacterial load
22 Immune Response NaN
23 Mechanism of Action Cell wall disruption
24 Target Cell wall
25 Combination Therapy None
26 Other Activities NaN
27 Pubmed ID 26930596
28 Year of Publication 2016
29 3-D Structure View in Jmol or Download Structure
Upvotes: 2