Reputation: 147
I want to save some awards information in IMDB website, but I'm not being able to access the javascript text I need.
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
urls = [
'https://www.imdb.com/event/ev0000003/2000',
'https://www.imdb.com/event/ev0000003/2001',
]
for url in urls:
response = requests.get(url_test).content
soup = BeautifulSoup(response, 'html.parser')
soup.find_all('script', {'type':'text/javascript'})
Now, how can I access only the categories information:
"categories":[{"categoryName":"Best Actor in a Leading Role","nominations":[{"primaryNominees":[{"name":"Kevin Spacey","note":null,"imageUrl":.....
Since I'll have to do this for different awards and years, my idea is to save them in a json file:
{"award": "oscars",
"year": "2000",
"data": [{"categoryName":"Best Actor in a Leading Role","nominations":[{"primaryNominees":[{"name":"Kevin Spacey","note":null,"imageUrl":.....
}
Upvotes: 1
Views: 69
Reputation: 195553
The data is stored in the javascript in the page, so you can access it via regexp for example. To parse the data you can use json
module.
For example:
import re
import json
import requests
urls = [
'https://www.imdb.com/event/ev0000003/2000',
'https://www.imdb.com/event/ev0000003/2001',
]
for url in urls:
response = requests.get(url).text
data = json.loads( re.findall(r'IMDbReactWidgets\.NomineesWidget\.push.*?(\{.*\})', response)[0] )
# print(json.dumps(data, indent=4)) # <-- comment this out to print all data
for award in data['nomineesWidgetModel']['eventEditionSummary']['awards']:
if award['awardName'] != 'Oscar':
continue
for category in award['categories']:
print(category['categoryName'])
print('-' * 80)
Prints:
Best Actor in a Leading Role
Best Actor in a Supporting Role
Best Actress in a Leading Role
Best Actress in a Supporting Role
Best Art Direction-Set Decoration
Best Cinematography
Best Costume Design
Best Director
Best Documentary, Features
Best Documentary, Short Subjects
Best Effects, Sound Effects Editing
Best Effects, Visual Effects
Best Film Editing
Best Foreign Language Film
...and so on.
Upvotes: 2