Reputation: 25
I am new to python and currently learning the language. For educational purpose I was trying to web scrape the fortune 500 list of companies from https://fortune.com/fortune500/2021/search for my analysis
I am kind of stuck. I was able to get so far below but the result is empty. Can some one help? Appreciate it
I am using ipynb notebooks in google colab for this exercise. I am trying to print the content of the table in a csv file.
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://fortune.com/fortune500/2021/search"
try:
page = urlopen(url)
except:
print("Error opening the URL")
# create a BeautifulSoup object for parsing
soup = BeautifulSoup(page, 'html.parser')
table_div = soup.find('div', {'class': 'rt-tbody'})
Upvotes: 0
Views: 1111
Reputation: 405
You can't scrape the site using beautifulsoup
. The table is rendered using JavaScript
so the table isn't present in the source html of the page. If you want to scrape you can use pyppeteer, to first render the page and get the html.
However, the data can be found on this url
https://content.fortune.com/wp-json/irving/v1/data/franchise-search-results?list_id=3040727&token=Zm9ydHVuZTpCcHNyZmtNZCN5SndjWkkhNHFqMndEOTM=.
It contains a token so, the url may not work after some time but you can still open up DevTools and look up the url on the Network Tab
filtered by XHR
.
Upvotes: 4