Reputation: 111
I am trying to extract the name of this stock.
The variable is read from var followObjTitle.
URL: https://www.nasdaq.com/symbol/aapl
from bs4 import BeautifulSoup
import requests
import re
import json
with requests.Session() as c:
nasdaq_baseurl = 'https://www.nasdaq.com/symbol/'
nasdaq_url = nasdaq_baseurl.__add__("AAPL")
url_fetch = c.get(nasdaq_url)
soup = BeautifulSoup(url_fetch.text, 'html.parser')
pattern = re.compile("var followObjTitle = '(.*?)';", re.MULTILINE | re.DOTALL)
script = soup.find_all("script", text=pattern, type="text/javascript")
name = soup.select('script')[]
print(name)
My expected output is "Apple Inc."
How can select that specific variable to extract its contents?
Upvotes: 0
Views: 5415
Reputation: 57125
In general, BS is not intended to parse JavaScript. Use plain re
. In your particular case, the line that you are looking for is var followObjTitle = "Apple Inc.";
. Note that it uses double quotation marks, but your regex attempts to match single quotation marks. Finally, remove re.MULTILINE | re.DOTALL
, they have no business in your regex.
pattern = re.compile("var followObjTitle = \"(.*?)\";")
pattern.findall(soup.text)
#['Apple Inc.']
Upvotes: 1