Python BS4 Not Retrieving Results

Question

Using the below code, I am able to fetch "soup" without an issue. My goal is to ultimately fetch the title within the soup object, but I'm having trouble figuring out how to do it. In addition to below, I've also tried various iterations of soup['results'], soup.results, soup.get_text().results .. etc and not sure how to get to it. I can, of course, do soup.get_text() ... (some kind of search function for the string "title," but feel like there has to be a built-in method for this.

55)get_title()
     54     ipdb.set_trace()
---> 55     title = soup.html.head.title.string
     56     title = re.sub(r'[^\x00-\x7F]+',' ', title)

ipdb> type(soup)

ipdb> soup.title
ipdb> print soup.title
None
ipdb> soup
{"status":"OK","copyright":"Copyright (c) 2018 The New York Times Company. All Rights Reserved.","section":"home","last_updated":"2018-01-07T06:19:00-05:00","num_results":42,"results":[{"section":"Briefing","subsection":"",**"title":"Trump, Palestinians, Golden Globes: Your Weekend Briefing"**, ....

Code

from __future__ import division

import regex as re
import string
import urllib2

from bs4 import BeautifulSoup
from cookielib import CookieJar
import ipdb

PARSER_TYPE = 'html.parser'

def get_title(url):
    cj = CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    p = opener.open(url)
    soup = BeautifulSoup(p.read(), PARSER_TYPE) # This loads fine
    ipdb.set_trace()
    title = soup.html.head.title.string # This is sad
    title = re.sub(r'[^\x00-\x7F]+',' ', title)
    return title

Python BS4 Not Retrieving Results

Answers (1)

Related Questions