Reputation: 117
I would like the following commands to grab the date from the address in this range but I can't seem to get it to run more than once. I am using Python 3. As you can see below the the url for the site is appended with i as to be read http://zinc.docking.org/substance/10 ; http://zinc.docking.org/substance/11 ... and so on. Here is the code:
import bs4 as bs
import urllib.request
site = "http://zinc.docking.org/substance/"
for i in range(10, 16):
site1 = str("%s%i" % (site, i))
sauce = urllib.request.urlopen(site1).read()
soup = bs.BeautifulSoup(sauce, 'lxml')
table1 = soup.find("table", attrs={"class": "substance-properties"})
for row in table1.findAll('tr'):
row1 = row.findAll('td')
ate = row1[0].getText()
print(ate)
This is my output:
$python3 Date.py
November 11th, 2005
The script should however give me 3 dates. This code works so I know that row[0] does in fact contain a value.I feel like there is some sort of simple formatting error but I am not sure where to begin troubleshooting. When I format it "Correctly" this is the code:
import bs4 as bs
import urllib.request
import pandas as pd
import csv
site = "http://zinc.docking.org/substance/"
for i in range(10, 16):
site1 = str("%s%i" % (site, i))
sauce = urllib.request.urlopen(site1).read()
soup = bs.BeautifulSoup(sauce, 'lxml')
table1 = soup.find("table", attrs={"class": "substance-properties"})
table2 = soup.find("table", attrs={"class": "protomers"})
for row in table1.findAll('tr'):
row1 = row.findAll('td')
ate = row1[0].getText()
print(ate)
The error I get is as follows:
Traceback (most recent call last):
File "Stack.py", line 11, in <module>
ate = row1[1].getText()
IndexError: list index out of range
The first code works so I know that row[0] does in fact contain a value. Any ideas?
Upvotes: 0
Views: 143
Reputation: 4862
You might want to fix your indentation:
import bs4 as bs
import urllib.request
site = "http://zinc.docking.org/substance/"
for i in range(10, 16):
site1 = str("%s%i" % (site, i))
sauce = urllib.request.urlopen(site1).read()
soup = bs.BeautifulSoup(sauce, 'lxml')
table1 = soup.find("table", attrs={"class": "substance-properties"})
for row in table1.findAll('tr'):
row1 = row.findAll('td')
Date = row1[0].getText()
print(Date)
Edit: You should rename your Date
variable, that is a reserved name. Also, by convention Python vars are lower case.
Upvotes: 1