lela_rib
lela_rib

Reputation: 147

Access javascript text using Beautiful Soup

I want to save some awards information in IMDB website, but I'm not being able to access the javascript text I need.

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

urls = [
    'https://www.imdb.com/event/ev0000003/2000',
    'https://www.imdb.com/event/ev0000003/2001',
]

for url in urls:
    response = requests.get(url_test).content
    soup = BeautifulSoup(response, 'html.parser')
    soup.find_all('script', {'type':'text/javascript'})


Now, how can I access only the categories information:

"categories":[{"categoryName":"Best Actor in a Leading Role","nominations":[{"primaryNominees":[{"name":"Kevin Spacey","note":null,"imageUrl":.....  

Since I'll have to do this for different awards and years, my idea is to save them in a json file:

{"award": "oscars",  
 "year": "2000",  
 "data": [{"categoryName":"Best Actor in a Leading Role","nominations":[{"primaryNominees":[{"name":"Kevin Spacey","note":null,"imageUrl":.....  
}

Upvotes: 1

Views: 69

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195553

The data is stored in the javascript in the page, so you can access it via regexp for example. To parse the data you can use json module.

For example:

import re
import json
import requests

urls = [
    'https://www.imdb.com/event/ev0000003/2000',
    'https://www.imdb.com/event/ev0000003/2001',
]

for url in urls:
    response = requests.get(url).text

    data = json.loads( re.findall(r'IMDbReactWidgets\.NomineesWidget\.push.*?(\{.*\})', response)[0] )

    # print(json.dumps(data, indent=4)) # <-- comment this out to print all data

    for award in data['nomineesWidgetModel']['eventEditionSummary']['awards']:
        if award['awardName'] != 'Oscar':
            continue
        for category in award['categories']:
            print(category['categoryName'])

    print('-' * 80)

Prints:

Best Actor in a Leading Role
Best Actor in a Supporting Role
Best Actress in a Leading Role
Best Actress in a Supporting Role
Best Art Direction-Set Decoration
Best Cinematography
Best Costume Design
Best Director
Best Documentary, Features
Best Documentary, Short Subjects
Best Effects, Sound Effects Editing
Best Effects, Visual Effects
Best Film Editing
Best Foreign Language Film

...and so on.

Upvotes: 2

Related Questions