Riggs
Riggs

Reputation: 49

Filtering through multiple xpath elements on web page

I need help scraping data from the rest of the tables off of the url listed. The code that I have seems to go only to the first table element. I need the SOYBEAN table data also. Any thoughts?

url = "http://markets.iowafarmbureau.com/markets/fixed.php?page=egrains"

driver = webdriver.Chrome() #this opens google Chrome (Chrome can be subbed w/ firefox et al)
driver.get(url) #this will go the the actual url listed

table = driver.find_element_by_xpath('//td[contains(div[@class="fixedpage_heading"],   "CORN")]/table[@class="homepage_quoteboard"]') 

test={}


for row in table.find_elements_by_tag_name('tr')[1:]:
    month = str(row.find_element_by_class_name('quotefield_shortmonthonly').text)
    low = str(row.find_element_by_class_name('quotefield_low').text)
    high = str(row.find_element_by_class_name('quotefield_high').text)
    test[month]=[low, high]

    print month, low, high

x=pd.DataFrame(test)
x.to_csv('test1.csv')
driver.close()

Upvotes: 1

Views: 1338

Answers (1)

alecxe
alecxe

Reputation: 473753

Since the structure of both tables is similar, make a reusable function that would get you the table by label:

def get_table_data(driver, table_name):
    table = driver.find_element_by_xpath('//div[@class="fixedpage_heading" and contains(., "{table_name}")]/following-sibling::table'.format(table_name=table_name))

    result = {}
    for row in table.find_elements_by_tag_name('tr')[1:]:
        month = str(row.find_element_by_class_name('quotefield_shortmonthonly').text)
        low = str(row.find_element_by_class_name('quotefield_low').text)
        high = str(row.find_element_by_class_name('quotefield_high').text)
        result[month]=[low, high]

    return result

print get_table_data(driver, 'CORN')
print get_table_data(driver, 'SOYBEANS')

Prints:

{'MAR': ['395-6', '399-0'], 'MAY': ['405-2', '405-2'], 'DEC': ['386-2', '390-0'], 'JUL': ['408-2', '409-4'], 'SEP': ['375-6', '378-0']}
{'MAR': ['1005-0', '1005-0'], 'AUG': ['1015-0', '1017-2'], 'SEP': ['1001-2', '1016-0'], 'MAY': ['1004-2', '1012-2'], 'JUL': ['1009-0', '1018-0'], 'JAN': ['999-6', '1002-6'], 'NOV': ['992-2', '1000-0']}

Upvotes: 3

Related Questions