Rob B.
Rob B.

Reputation: 128

Python Continue Loop

I am using the following code from this tutorial (http://jeriwieringa.com/blog/2012/11/04/beautiful-soup-tutorial-part-1/).

from bs4 import BeautifulSoup

soup = BeautifulSoup (open("43rd-congress.html"))

final_link = soup.p.a
final_link.decompose()

trs = soup.find_all('tr')

for tr in trs:
    for link in tr.find_all('a'):
        fulllink = link.get ('href')
print fulllink #print in terminal to verify results

tds = tr.find_all("td")

try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
    names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
    years = str(tds[1].get_text())
    positions = str(tds[2].get_text())
    parties = str(tds[3].get_text())
    states = str(tds[4].get_text())
    congress = tds[5].get_text()

except:
    print "bad tr string"
    continue #This tells the computer to move on to the next item after it encounters an error

print names, years, positions, parties, states, congress

However, I get an error saying that 'continue' is not properly in the loop on line 27. I am using notepad++ and windows powershell. How do I make this code work?

Upvotes: 2

Views: 333

Answers (5)

original_username
original_username

Reputation: 2399

White space has significance in python.

This is where things go downhill:

for tr in trs:
    for link in tr.find_all('a'):
        fulllink = link.get ('href')
print fulllink #print in terminal to verify results

You should start, and continue, to indent the code with the appropriate number of tabs, for as long as you intend to loop.

for tr in trs:
    for link in tr.find_all('a'):
        fulllink = link.get ('href')
        print fulllink #print in terminal to verify results

Upvotes: 1

rodxander
rodxander

Reputation: 53

My answer maybe this simple, but it really is not on a loop, it must be on a loop the same way break works on conditionals and loops. Maybe your indentation is off, it is a big MUST and really important in python.

Upvotes: 0

Jeff Tratner
Jeff Tratner

Reputation: 17076

You have to indent your code another indentation level (ie 4 spaces/1 tab) beyond the indentation of the for loop. The try/except isn't I'm the for loop which is why you get the continue error.

Indentation shows where blocks go together (a for loop starts a new block and you need to indent underneath that)

Upvotes: 0

John
John

Reputation: 13699

Looks like your indentation is off, try this.

from bs4 import BeautifulSoup

soup = BeautifulSoup (open("43rd-congress.html"))

final_link = soup.p.a
final_link.decompose()

trs = soup.find_all('tr')

for tr in trs:
    for link in tr.find_all('a'):
        fulllink = link.get ('href')

        print fulllink #print in terminal to verify results

        tds = tr.find_all("td")

        try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
            names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
            years = str(tds[1].get_text())
            positions = str(tds[2].get_text())
            parties = str(tds[3].get_text())
            states = str(tds[4].get_text())
            congress = tds[5].get_text()

        except:
            print "bad tr string"
            continue #This tells the computer to move on to the next item after it encounters an error

        print names, years, positions, parties, states, congress

Upvotes: 1

John La Rooy
John La Rooy

Reputation: 304137

Everything from print fulllink down is outside the for loop

for tr in trs:
    for link in tr.find_all('a'):
        fulllink = link.get ('href')
    ## indented here!!!!!
    print fulllink #print in terminal to verify results

    tds = tr.find_all("td")

    try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
        names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
        years = str(tds[1].get_text())
        positions = str(tds[2].get_text())
        parties = str(tds[3].get_text())
        states = str(tds[4].get_text())
        congress = tds[5].get_text()

    except:
        print "bad tr string"
        continue #This tells the computer to move on to the next item after it encounters an error

    print names, years, positions, parties, states, congress

Upvotes: 2

Related Questions