How do I web scrape the names of the production companies from IMDB website

Question

I need to scrape the names of the Production Companies of some movies. I keep try by using the anchor tag a and the class in which the names are enclosed but it does not return the production companies.

URL : https://www.imdb.com/title/tt0473553/?ref_=fn_al_tt_1

Here's the HTML part of the website that I want to scrape :


  
    
      Production companies
        
          
            
                IDT Entertainment
            
            
                New Arc Entertainment

Here's, What I have tried :

import requests
from bs4 import BeautifulSoup

movie_url="https://www.imdb.com/title/tt0473553/?ref_=fn_al_tt_1"
movie_page = requests.get(movie_url)
soup = BeautifulSoup(page.text, 'html.parser')

#movies_comp = soup.find_all("li", class_="ipc-inline-list__item")
movies_comp = soup.find_all("a", class_="ipc-metadata-list-item__list-content-item ipc-metadata-list-item__list-content-item--link")

print(movies_comp)

I am not getting desirable output. What I am expecting it to return output is like:

['IDT Entertainment', 'New Arc Entertainment']

imxitiz · Accepted Answer

Here's what you can try :

import requests

from bs4 import BeautifulSoup

page=requests.get("https://www.imdb.com/title/tt0473553/?ref_=fn_al_tt_1")

page="""

  
    
      Production companies
        
          
            
                IDT Entertainment
            
            
                New Arc Entertainment
            
          
        
      
    
  

"""

soup=BeautifulSoup(page,"lxml")

# To understand this is then structur of the data you want to extract :
# 
    # 
        # IDT Entertainment
        # New Arc Entertainment

print([a.text for a in soup.find("li",attrs={'class':r'ipc-metadata-list__item ipc-metadata-list-item--link','data-testid':r'title-details-companies'})
                                .find("ul",class_="ipc-inline-list ipc-inline-list--show-dividers ipc-inline-list--inline ipc-metadata-list-item__list-content base")
                                    .find_all("a")])

Output :

['IDT Entertainment', 'New Arc Entertainment']

There are with that class so, you are getting multiple of them.

How do I web scrape the names of the production companies from IMDB website

Answers (1)

Related Questions