jarthoben
jarthoben

Reputation: 129

button click prior to scraping html table

Hi I am trying to scape an HTML table and I have working code.

The one URL, however, contains two html tables. The first table contains "quarterly" numbers and loads by default with the url. When you click the button above the table, you can switch to the second table with "annual" numbers.

My code only picks up first default (quarterly) table that appears when the url loads.

How can I get my python code to scrape the second "annual" table? Can selenium do this? If so could anyone provide any guidance?

#!/usr/local/bin/python3

import requests
import pandas as pd

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'}
r = requests.get("https://www.investing.com/equities/exxon-mobil-income-statement", headers=headers)
df = pd.read_html(r.content)[1]
print(df)

Many thanks

Upvotes: 1

Views: 643

Answers (3)

ASH
ASH

Reputation: 20362

Try the following:

Sub Web_Table()
    Dim HTMLDoc As New HTMLDocument
    Dim objTable As Object
    Dim lRow As Long
    Dim lngTable As Long
    Dim lngRow As Long
    Dim lngCol As Long
    Dim ActRw As Long
    Dim objIE As InternetExplorer
    Set objIE = New InternetExplorer
    objIE.Navigate "https://www.investing.com/equities/exxon-mobil-income-statement"

    Do Until objIE.ReadyState = 4 And Not objIE.Busy
        DoEvents
    Loop
    Application.Wait (Now + TimeValue("0:00:03")) 'wait for java script to load
    HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
    With HTMLDoc.body
        Set objTable = .getElementsByTagName("table")
        For lngTable = 0 To objTable.Length - 1
            For lngRow = 0 To objTable(lngTable).Rows.Length - 1
                For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
                    ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
                Next lngCol
            Next lngRow
            ActRw = ActRw + objTable(lngTable).Rows.Length + 1
        Next lngTable
    End With
    objIE.Quit
End Sub

enter image description here

Upvotes: 0

jarthoben
jarthoben

Reputation: 129

After much googling and some other stack posts, finally got this working:

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd

browser = webdriver.Firefox(executable_path=r'/Users/xxxxxx/Documents/python/web_drivers/geckodriver')
browser.get('https://www.investing.com/equities/exxon-mobil-income-statement')
linkElem = browser.find_element_by_link_text('Annual')
linkElem.click()

r = browser.find_element_by_css_selector("#rrtable > table").get_attribute('outerHTML')
browser.quit()

soup = BeautifulSoup(r, 'html.parser')

df = pd.read_html(str(soup))[0]

print(df)

Upvotes: 1

Nivardo Albuquerque
Nivardo Albuquerque

Reputation: 536

Yes,

You can do it with selenium.

driver.get("https://www.investing.com/equities/exxon-mobil-income-statement")
annual_button = driver.find_element_by_css_selector("#leftColumn > div.alignBottom > div.float_lang_base_1 > a:nth-child(1)")
annual_button.click()
print(driver.find_element_by_css_selector("#rrtable > table").get_attribute('innerHTML'))

Here's a python code for that.

What it does? It entesr the page, finds the annual_button element by its css selector and than clicks it. Than, it find the table by its css selector and prints the HTML of it.

Hope it helps.

Upvotes: 1

Related Questions