Reputation: 1511
can someone tell me why, when I run this code:
import urllib2
for i in range(1,2):
id_name ='AP' + str("{:05d}".format(i))
web_page = "http://aps.unmc.edu/AP/database/query_output.php?ID=" + id_name
page = urllib2.urlopen(web_page)
html = page.read()
print html
It returns:
<html>
<head>
<title>detailed information</title>
<style type="text/css">
H1 {font-family:"Time New Roman", Times; font-style:bold; font-size:18pt; color:blue}
H1{text-align:center}
P{font-family:"Time New Roman", Times; font-style:bold; font-size:14pt; line-height:20pt}
P{text-align:justify;margin-left:0px; margin-right:0px;color:blue}
/body{background-image:url('sky.gif')}
/
A:link{color:blue}
A:visited{color:#996666}
</style>
</head>
<H1>Antimicrobial Peptide APAP00001</H1>
<html>
<p style="margin-left: 400px; margin-top: 4; margin-bottom: 0; line-height:100%">
<b>
<a href = "#" onclick = "window.close(self)"><font size="3" color=blue>Close this window
</font> </a>
</b>
</p>
</p>
</body>
</html>
And not the actual data on the page (http://aps.unmc.edu/AP/database/query_output.php?ID=00001) (e.g. net charge, length)?
If I edit this code slightly somehow, is it possible to return all of the information on the page (e.g. the information about net charge, length etc), and not just information about how the page is formatted?
Thanks
Edit 1: Due to Gahan's comment below, I tried this: import requests from bs4 import BeautifulSoup
for i in range(8,9):
webpage = "https://dbaasp.org/peptide-card?type=39&id=" + str(i)
response = requests.get(webpage)
soup = BeautifulSoup(response.content, 'html.parser')
print soup
However, I still seem the same answer (for example, if I run the edit 1 code and direct output to a file, and then grep the peptide sequence in the output file, it is not there).
Upvotes: 0
Views: 42
Reputation: 4213
use requests library:
import requests
from bs4 import BeautifulSoup
data_require = ["Net charge", ]
for i in range(1,2):
id_value ="{:05d}".format(i)
url = "http://aps.unmc.edu/AP/database/query_output.php"
payload = {"ID": id_value}
response = requests.get(url, params=payload)
soup = BeautifulSoup(response.content, 'html.parser')
table_structure = soup.find('table')
all_p_tag = table_structure.find_all('p')
data = {}
for i in range(0, len(all_p_tag), 2):
data[all_p_tag[i].text] = all_p_tag[i+1].text.encode('utf-8').strip()
print("{} {}".format(all_p_tag[i].text, all_p_tag[i+1].text.encode('utf-8').strip()))
print(data)
Note:
you don't need to convert "{:05d}".format(i)
to string as it will only return string when you use format()
because it's string formatting.
also I have updated code to get tag details too. you don't need to use grep for it because BeautifulSoup is already providing such facility.
Upvotes: -1
Reputation: 77892
In your original snippet, you use "AP00001" as query param:
id_name ='AP' + str("{:05d}".format(i))
so your url is: "http://aps.unmc.edu/AP/database/query_output.php?ID=AP00001", instead of "http://aps.unmc.edu/AP/database/query_output.php?ID=00001"
A fixed version of your first snippet using requests
:
url = "http://aps.unmc.edu/AP/database/query_output.php"
for i in range(1,2):
id_name = "{:05d}".format(i)
response = requests.get(url, params={"ID":id_name})
print response.content
Upvotes: 2