Reputation: 87
I'm a complete newbie in scraping and I'm trying to scrape https://fr.finance.yahoo.com and I can't figure out what I'm doing wrong.
My goal is to scrape the index name, current level and the change(both in value and in %)
Here is the code I have used:
import urllib.request
from bs4 import BeautifulSoup
url = 'https://fr.finance.yahoo.com'
request = urllib.request.Request(url)
html = urllib.request.urlopen(request).read()
soup = BeautifulSoup(html,'html.parser')
main_table = soup.find("div",attrs={'data-reactid':'12'})
print(main_table)
links = main_table.find_all("li", class_=' D(ib) Bxz(bb) Bdc($seperatorColor) Mend(16px) BdEnd ')
print(links)
However, the print(links)
comes out empty. Could someone please assist? Any help would be highly appreciated as I have been trying to figure this out for a few days now.
Upvotes: 1
Views: 451
Reputation: 164
Although the better way to get all the fields is parse and process the relevant script tag, this is one of the ways you can get all them.
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://fr.finance.yahoo.com/'
r = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(r.text,'html.parser')
df = pd.DataFrame(columns=['Index Name','Current Level','Value','Percentage Change'])
for item in soup.select("[id='market-summary'] li"):
index_name = item.select_one("a").contents[1]
current_level = ''.join(item.select_one("a > span").text.split())
value = ''.join(item.select_one("a")['aria-label'].split("ou")[1].split("points")[0].split())
percentage_change = ''.join(item.select_one("a > span + span").text.split())
df = df.append({'Index Name':index_name, 'Current Level':current_level,'Value':value,'Percentage Change':percentage_change}, ignore_index=True)
print(df)
Output are like:
Index Name Current Level Value Percentage Change
0 CAC 40 4444,56 -0,88 -0,02%
1 Euro Stoxx 50 2905,47 0,49 +0,02%
2 Dow Jones 24438,63 -35,49 -0,15%
3 EUR/USD 1,0906 -0,0044 -0,40%
4 Gold future 1734,10 12,20 +0,71%
5 BTC-EUR 8443,23 161,79 +1,95%
6 CMC Crypto 200 185,66 4,42 +2,44%
7 Pétrole WTI 33,28 -0,64 -1,89%
8 DAX 11073,87 7,94 +0,07%
9 FTSE 100 5993,28 -21,97 -0,37%
10 Nasdaq 9315,26 30,38 +0,33%
11 S&P 500 2951,75 3,24 +0,11%
12 Nikkei 225 20388,16 -164,15 -0,80%
13 HANG SENG 22930,14 -1349,89 -5,56%
14 GBP/USD 1,2177 -0,0051 -0,41%
Upvotes: 1
Reputation: 2901
I think you need to fix your element selection.
For example the following code:
import urllib.request
from bs4 import BeautifulSoup
url = 'https://fr.finance.yahoo.com'
request = urllib.request.Request(url)
html = urllib.request.urlopen(request).read()
soup = BeautifulSoup(html,'html.parser')
main_table = soup.find(id="market-summary")
links = main_table.find_all("a")
for i in links:
print(i.attrs["aria-label"])
Gives output text having index name, % change, change, and value:
CAC 40 a augmenté de 0,37 % ou 16,55 points pour atteindre 4 461,99 points
Euro Stoxx 50 a augmenté de 0,28 % ou 8,16 points pour atteindre 2 913,14 points
Dow Jones a diminué de -0,63 % ou -153,98 points pour atteindre 24 320,14 points
EUR/USD a diminué de -0,49 % ou -0,0054 points pour atteindre 1,0897 points
Gold future a augmenté de 0,88 % ou 15,10 points pour atteindre 1 737,00 points
a augmenté de 1,46 % ou 121,30 points pour atteindre 8 402,74 points
CMC Crypto 200 a augmenté de 1,60 % ou 2,90 points pour atteindre 184,14 points
Pétrole WTI a diminué de -3,95 % ou -1,34 points pour atteindre 32,58 points
DAX a augmenté de 0,29 % ou 32,27 points pour atteindre 11 098,20 points
FTSE 100 a diminué de -0,39 % ou -23,18 points pour atteindre 5 992,07 points
Nasdaq a diminué de -0,30 % ou -28,25 points pour atteindre 9 256,63 points
S&P 500 a diminué de -0,43 % ou -12,62 points pour atteindre 2 935,89 points
Nikkei 225 a diminué de -0,80 % ou -164,15 points pour atteindre 20 388,16 points
HANG SENG a diminué de -5,56 % ou -1 349,89 points pour atteindre 22 930,14 points
GBP/USD a diminué de -0,34 % ou -0,0041 points pour atteindre 1,2186 points
Upvotes: 1
Reputation: 33384
Try following css
selector to get all the links.
import urllib
from bs4 import BeautifulSoup
url = 'https://fr.finance.yahoo.com'
request = urllib.request.Request(url)
html = urllib.request.urlopen(request).read()
soup = BeautifulSoup(html,'html.parser')
links=[link['href'] for link in soup.select("ul#market-summary a")]
print(links)
Output:
['/quote/^FCHI?p=^FCHI', '/quote/^STOXX50E?p=^STOXX50E', '/quote/^DJI?p=^DJI', '/quote/EURUSD=X?p=EURUSD=X', '/quote/GC=F?p=GC=F', '/quote/BTC-EUR?p=BTC-EUR', '/quote/^CMC200?p=^CMC200', '/quote/CL=F?p=CL=F', '/quote/^GDAXI?p=^GDAXI', '/quote/^FTSE?p=^FTSE', '/quote/^IXIC?p=^IXIC', '/quote/^GSPC?p=^GSPC', '/quote/^N225?p=^N225', '/quote/^HSI?p=^HSI', '/quote/GBPUSD=X?p=GBPUSD=X']
Upvotes: 0