Reputation: 2296
I am trying to scrape information from the following url:http://www.mobygames.com/game/xbox360/wheelman/credits with this code;
# Imports
import requests
from bs4 import BeautifulSoup
credit_link = "http://www.mobygames.com/game/xbox360/wheelman/credits"
response = requests.get(credit_link)
soup = BeautifulSoup(response.text, "lxml")
credit_infor= soup.find("div", class_="col-md-8 col-lg-8")
credit_infor1 = credit_infor.select('table[summary="List of Credits"]')[0].find_all('tr')
This is the format that I need to get:
info credit_to studio game console
starring 138920 starring Wheelman Xbox 360
Studio Heads 151851 Midway Newcastle Studio Wheelman Xbox 360
Studio Heads 73709 Midway Newcastle Studio Wheelman Xbox 360
Where info corresponds to first "td" in each row, credit_to corresponds to id of particular contributor (e.g. 138920 is id of Vin Diesel) starring corresponds to titles. I think I can handle everything except getting studio name (i.e. titles) near each row (it will be switched from Midway Newcastle Studio to San Diego QA Team later and so on). How could I do it?
Upvotes: 1
Views: 65
Reputation: 7238
According to your program, credit_infor1
will have a list of all tr
tags (rows). If you check the HTML, the rows that have the title (studio) in them, they don't have a class
attribute. For all the other rows, they have class="crln"
attribute.
So, you can iterate over all the rows and check if the current row has class
as an attribute using the has_attr()
function (which is somewhat hidden in the docs). If the attribute is not present, change the title, else continue with the scraping of other data.
Continuing your program:
studio = ''
for row in credit_infor1:
if not row.has_attr('class'):
studio = row.h2.text
continue
# get other values that you want from this row below
info = row.find('td').text
# similarly get all the other values you need each time
print(info + ' | ' + studio)
Partial output:
Starring | Starring
Studio Heads | Midway Newcastle Studio
Executive Producers | Midway Newcastle Studio
Technical Directors | Midway Newcastle Studio
Lead Programmers | Midway Newcastle Studio
...
QA Manager | San Diego QA Team
Compliance QA Manager | San Diego QA Team
QA Data Analyst | San Diego QA Team
...
SQA Analyst | SQS India QA
QA Team | SQS India QA
Executive Producers | Tigon Studios
Head of Game Production | Tigon Studios
...
Upvotes: 1