Reputation: 117
UPDATE:
After Pygirl suggestion I am attempting to use Selenium, but i'm still only getting the sector data:
import requests
import csv
import pandas as pd
from requests import get
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.action_chains import ActionChains
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.set_window_size(1024, 600)
driver.maximize_window()
driver.get('https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/si_performance.jhtml?tab=siperformance')
action = ActionChains(driver)
sleep(4)
industry_link = driver.find_element_by_css_selector('#tab_industry')
action.move_to_element(industry_link)
action.click(industry_link)
action.perform()
url = driver.current_url
r = requests.get(url)
sleep(10)
df_industry_list = pd.read_html(r.text)
df_industry = df_industry_list[0]
df_industry.head()
df_industry.to_excel("SectorPerf.xlsx", sheet_name = "Industry")
I'm trying to get the data from the Industry link of this url: https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/si_performance.jhtml?tab=siperformance
I have written some code that will get the SECTOR link information, however my approach doesn't seem to work for the Industry as the URL appears to be the same for both the sector and the Industry tab...
import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd
from requests import get
url = 'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/si_performance.jhtml?tab=siperformance'
r = requests.get(url)
#soup = BeautifulSoup(response.content, 'html.parser')
#sectors = soup.find("table", id="perfTableSort")
df_list = pd.read_html(r.text)
df = df_list[0]
df.head()
#print(df)
Given that the Url seems to be the same (at least is showing the same in my address bar on chrome), how can I also get the Industry data?
Thanks
Upvotes: 0
Views: 82
Reputation: 13349
Using driver.page_source
. Extract table part and store it in form of csv or excel
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.set_window_size(1024, 600)
driver.maximize_window()
driver.get('https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/si_performance.jhtml?tab=siperformance')
# action = webdriver.ActionChains(driver)
print(driver.page_source) # <--- this will give you source code for Sector
sleep(2)
industry_link = driver.find_element_by_xpath('//*[@id="tab_industry"]')
# action.move_to_element(industry_link)
industry_link.click()
# action.perform()
print(driver.page_source) # <--- this will give you source code for Industry
sleep(2)
Upvotes: 1
Reputation: 445
Try this..
url = 'https://eresearch.fidelity.com/eresearch/markets_sectors/si_performance.jhtml'
industry = {'tab': 'industry'}
sector = {'tab': 'sector'}
r = requests.post(url, data=industry)
#soup = BeautifulSoup(response.content, 'html.parser')
#sectors = soup.find("table", id="perfTableSort")
df_list = pd.read_html(r.text)
df = df_list[0]
df.head()
Now you can put data=industry or data=sector to get desired result..
Upvotes: 2