Jackey12345
Jackey12345

Reputation: 159

Get data form table in beautiful soup

I am trying to retreive the 'Shares Outstanding' of a stock via this page:

https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v#

(Click on 'Financial Statements' - 'Condensed Consolidated Balance Sheets (Unaudited) (Parenthical)')

the data is in the bottom of the table in the left row, I am using beautiful soup but I am having issues with retreiving the sharecount.

the code I am using:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v#'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

rows = soup.find_all('tr')

for row in rows:
    document = row.find('a', string='Common stock, shares outstanding (in shares)')
    shares = row.find('td', class_='nump')
    if None in (document, shares):
        continue
    print(document)
    print(shares)

this returns nothing, but the desired output is 4,323,987,000

can someone help me to retreive this data?

Thanks!

Upvotes: 1

Views: 115

Answers (2)

Jack Fleeting
Jack Fleeting

Reputation: 24930

Ah, the joys of scraping EDGAR filings :(...

You're not getting your expected output because you're looking in the wrong place. The url you have is an ixbrl viewer. The data comes from here:

url = 'https://www.sec.gov/Archives/edgar/data/320193/000032019320000052/R1.htm'

You can either find that url by looking at network tab in the developer tooks, or, you can simply translate the viewer url into this url: for example, the 320193& figure is the cik number, etc.

Once you figure that out, the rest is simple:

req = requests.get(url)
soup = bs(req.text,'lxml')
soup.select_one('.nump').text.strip()

Output:

'4,334,335'

Edit:

To search by "Shares Outstanding", try:

targets = soup.select('tr.ro')
for target in targets:
    targ = target.select('td.pl')
    for t in targ:
        if "Shares Outstanding" in t.text:
            print(target.select_one('td.nump').text.strip())

And might as well throw this one in: Another, different way, to do that is to use xpath instead, using the lxml library:

import lxml.html as lh

doc = lh.fromstring(req.text)
doc.xpath('//tr[@class="ro"]//td[@class="pl "][contains(.//text(),"Shares Outstanding")]/following-sibling::td[@class="nump"]/text()')[0]

Upvotes: 1

Pygirl
Pygirl

Reputation: 13349

That's a JS rendered page. Use Selenium:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep
# import requests
from bs4 import BeautifulSoup
url = 'https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v#'

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.set_window_size(1024, 600)
driver.maximize_window()
driver.get(url)
time.sleep(10) # <--- waits for 10 seconds so that page can gets rendered
# action = webdriver.ActionChains(driver)
# print(driver.page_source) # <--- this will give you source code 
soup = BeautifulSoup(driver.page_source)
rows = soup.find_all('tr')

for row in rows:
    shares = row.find('td', class_='nump')
    if shares:
        print(shares)

<td class="nump">4,334,335<span></span>
</td>
<td class="nump">4,334,335<span></span>
</td>


Better use :

shares = soup.find('td', class_='nump')
if shares:
    print(shares.text.strip())

4,334,335

Upvotes: 2

Related Questions