Reputation: 5
I would like to print the company name from the Google Finance page, using the div class appbar-snippet-primary. The code I am usng returns none or []. Wasn't able to get to the span tag containing the company name using beautifulsoup.
html = urlopen('https://www.google.com/finance?q=F')
soup = BeautifulSoup(html, "html.parser")
x = soup.find(id='appbar-snippet-primary')
print(x)
Thank you for the explanation. I have updated the code as you suggested and included the stock price, created a loop, then stored the information in a dictionary.
from bs4 import BeautifulSoup
import requests
x = ('F', 'GE', 'GOOGL')
Company = {}
for i in x:
head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
html = requests.get('https://www.google.com/finance?q=%s' % (i) , headers=head).content
soup = BeautifulSoup(html, "html.parser")
c = soup.find("div", class_="appbar-snippet-primary").text
p = soup.find('span',class_='pr').span.text
Company.update({c : p})
for k, v in Company.items():
print('{:<30} {:>8}'.format(k,v))
Upvotes: 0
Views: 754
Reputation: 180441
The value is not dynamically generated by Javascript, it is in the source, all you need to do is add a user-agent and use the correct tag name, the following example using requests gets what you want:
from bs4 import BeautifulSoup
import requests
head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
html = requests.get('https://www.google.com/finance?q=F', headers=head).content
soup = BeautifulSoup(html, "html.parser")
x = soup.find("div", class_="appbar-snippet-primary")
print(x)
Which returns:
<div class="appbar-snippet-primary"><span>Ford Motor Company</span></div>
If we run the code using x.text
to pull the text you can see the output is correct:
In [14]: from bs4 import BeautifulSoup
In [15]: import requests
In [16]: head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
In [17]: html = requests.get('https://www.google.com/finance?q=F', headers=head).content
In [18]: soup = BeautifulSoup(html, "html.parser")
In [19]: x = soup.find("div", class_="appbar-snippet-primary")
In [20]: print(x.text)
Ford Motor Company
Now without a user-agent:
In [21]: from bs4 import BeautifulSoup
In [22]: import requests
In [23]: html = requests.get('https://www.google.com/finance?q=F').content
In [24]: soup = BeautifulSoup(html, "html.parser")
In [25]: x = soup.find("div", class_="appbar-snippet-primary")
In [26]: print(x)
None
And x is None as you don't get the same source returned.
Upvotes: 0
Reputation: 55448
The element you're interested in looks like this
<div class="appbar-snippet-primary">
<span>Ford Motor Company</span>
</div>
So it's a div
with class="appbar-snippet-primary"
, not id="appbar-snippet-primary"
like your code implies.
However there is a deeper problem, that div isn't set until the JavaScript on that page runs, so it's not going to be possible to download the raw HTML and run BeautifulSoup on it, because then the JS isn't executed yet.
One of the script
tags in that raw HTML contains: var _companyName = 'Ford Motor Company';
, so you can grep for that _companyName =
if you insist on using the raw HTML.
You can use Selenium, because it pilots an actual browser and runs the JS, then you can find that element using its class
from __future__ import print_function
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("https://www.google.com/finance?q=F")
div = driver.find_element_by_css_selector('.appbar-snippet-primary')
company_name = div.text
print(company_name)
driver.close()
I get:
Ford Motor Company
Upvotes: 1