extracting data from an HTML table using BeautifulSoup

Question

UPDATE:

After Pygirl suggestion I am attempting to use Selenium, but i'm still only getting the sector data:

import requests
import csv
import pandas as pd
from requests import get
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.action_chains import ActionChains
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.set_window_size(1024, 600)
driver.maximize_window()
driver.get('https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/si_performance.jhtml?tab=siperformance')
action = ActionChains(driver)
sleep(4)
industry_link = driver.find_element_by_css_selector('#tab_industry')
action.move_to_element(industry_link)
action.click(industry_link)
action.perform()

url = driver.current_url
r = requests.get(url)

sleep(10)

df_industry_list = pd.read_html(r.text)
df_industry = df_industry_list[0]
df_industry.head()
df_industry.to_excel("SectorPerf.xlsx", sheet_name = "Industry")

I'm trying to get the data from the Industry link of this url: https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/si_performance.jhtml?tab=siperformance

I have written some code that will get the SECTOR link information, however my approach doesn't seem to work for the Industry as the URL appears to be the same for both the sector and the Industry tab...

import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd
from requests import get

url = 'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/si_performance.jhtml?tab=siperformance'
r = requests.get(url)
#soup = BeautifulSoup(response.content, 'html.parser')

#sectors = soup.find("table", id="perfTableSort")
df_list = pd.read_html(r.text)
df = df_list[0]
df.head()
#print(df)

Given that the Url seems to be the same (at least is showing the same in my address bar on chrome), how can I also get the Industry data?

Thanks

pritam samanta · Accepted Answer

Try this..

url = 'https://eresearch.fidelity.com/eresearch/markets_sectors/si_performance.jhtml'

industry = {'tab': 'industry'}
sector = {'tab': 'sector'}

r = requests.post(url, data=industry)
#soup = BeautifulSoup(response.content, 'html.parser')

#sectors = soup.find("table", id="perfTableSort")
df_list = pd.read_html(r.text)
df = df_list[0]
df.head()

Now you can put data=industry or data=sector to get desired result..

extracting data from an HTML table using BeautifulSoup

Answers (2)

Related Questions