python beautifulsoup loop through table rows by section

Question

I'm new to beautifulsoup and python, and I'm pretty sure this is a dead-simple problem but I can't seem to get anywhere solving it.

I'm trying to loop through rows of an html table, based on "header" rows that group the table by types of candy. My table looks like this:

I want the loop to get the date under each candy heading. So the iterations would get data like this:

first loop iteration: candy_type: kitkat, location: Mall 1, Planned: 63, Actual: 0, Diff: 25

second iteration: candy_type: kitkat, location: Mall 2, Planned: 7, Actual: 0, Diff: 6

... last iteration: candy_type: Skittles, location: Building 2, Planned: 320, Actual: 236, Diff: 0

This is the table code:


   
      Candy
   
   
      
         
            KitKat
         
      
   
   
      LOCATION
      PLANNED
      ACTUAL
      DIFF
   
   
      Mall 1
      63
      0
      25
   
   
      Mall 2
      7
      0
      6
   
   
      
         
            OH Henry
         
      
   
   
      LOCATION
      PLANNED
      ACTUAL
      DIFF
   
   
      Warehouse 1
      195
      122
      30
   
   
      Warehouse 2
      96
      76
      6
   
   
      
         
            Skittles
         
      
   
   
      LOCATION
      PLANNED
      ACTUAL
      DIFF
   
   
      Building 1
      120
      90
      5
   
   
      Building 2
      320
      236
      0

so I tried

from bs4 import BeautifulSoup
import urllib

readUrl = urllib.urlopen('test.html').read()
soup = BeautifulSoup(readUrl)
candytype = soup.findAll('tr',{"bgcolor" : "#CEE3F6"})
for type in candytype:
    print type

This prints out the three candy types like this:




KitKat





OH Henry





Skittles

I thought I could group the candy "headers" (i.e. the tr elements whose bgcolor set to #CEE3F6) and then iterate on that basis, but I cannot figure out how to get further into the data.

Any ideas?

python beautifulsoup loop through table rows by section

Answers (1)

Related Questions