BeautifulSoup extract script variable data

Question

I am trying to extract the name of this stock.

The variable is read from var followObjTitle.

from bs4 import BeautifulSoup
import requests
import re
import json

with requests.Session() as c:

      nasdaq_baseurl = 'https://www.nasdaq.com/symbol/'
      nasdaq_url = nasdaq_baseurl.__add__("AAPL")


      url_fetch = c.get(nasdaq_url)
      soup = BeautifulSoup(url_fetch.text, 'html.parser')

      pattern = re.compile("var followObjTitle = '(.*?)';", re.MULTILINE | re.DOTALL)
      script = soup.find_all("script", text=pattern, type="text/javascript")
      name = soup.select('script')[]
      print(name)

My expected output is "Apple Inc."

How can select that specific variable to extract its contents?

DYZ · Accepted Answer

In general, BS is not intended to parse JavaScript. Use plain re. In your particular case, the line that you are looking for is var followObjTitle = "Apple Inc.";. Note that it uses double quotation marks, but your regex attempts to match single quotation marks. Finally, remove re.MULTILINE | re.DOTALL, they have no business in your regex.

pattern = re.compile("var followObjTitle = \"(.*?)\";")
pattern.findall(soup.text)
#['Apple Inc.']

BeautifulSoup extract script variable data

Answers (1)

Related Questions