Scraping of fortune 500 company list for 2021 using Python

Question

I am new to python and currently learning the language. For educational purpose I was trying to web scrape the fortune 500 list of companies from https://fortune.com/fortune500/2021/search for my analysis

I am kind of stuck. I was able to get so far below but the result is empty. Can some one help? Appreciate it

I am using ipynb notebooks in google colab for this exercise. I am trying to print the content of the table in a csv file.

from urllib.request import urlopen
from bs4 import BeautifulSoup

url = "https://fortune.com/fortune500/2021/search"

try:
   page = urlopen(url)
except:
   print("Error opening the URL")

# create a BeautifulSoup object for parsing
soup = BeautifulSoup(page, 'html.parser')

table_div = soup.find('div', {'class': 'rt-tbody'})

Gyanendro Kh · Accepted Answer

You can't scrape the site using beautifulsoup. The table is rendered using JavaScript so the table isn't present in the source html of the page. If you want to scrape you can use pyppeteer, to first render the page and get the html.

However, the data can be found on this url
https://content.fortune.com/wp-json/irving/v1/data/franchise-search-results?list_id=3040727&token=Zm9ydHVuZTpCcHNyZmtNZCN5SndjWkkhNHFqMndEOTM=.
It contains a token so, the url may not work after some time but you can still open up DevTools and look up the url on the Network Tab filtered by XHR.

Scraping of fortune 500 company list for 2021 using Python

Answers (1)

Related Questions