Fabulini
Fabulini

Reputation: 171

Scraping a list of prices using Python

I am trying to analyze the data on this website: Electricity prices

I tried to do it using Beautiful Soup:

from bs4 import BeautifulSoup
import requests
page = requests.get('https://transparency.entsoe.eu/transmission-domain/r2/dayAheadPrices/show?name=&defaultValue=false&viewType=TABLE&areaType=BZN&atch=false&dateTime.dateTime=01.10.2018+00:00%7CCET%7CDAY&biddingZone.values=CTY%7C10YAT-APG------L!BZN%7C10YAT-APG------L&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)')
soup = BeautifulSoup(page.text, 'html.parser')
price_hide = soup.find(class_='dv-value-cell')
print(price_hide)

To which I got so far:

<td class="dv-value-cell">
<span       onclick="showDetail('eu.entsoe.emfip.transmission_domain.r2.presentation.entity.DayAheadPricesMongoEntity', '5bb0b150623a7295d97e9b6d', '2018-09-30T22:00:00.000Z', 'PRICE', 'CET');">59.53</span>

But how do I scrape the whole table?

Upvotes: 2

Views: 654

Answers (4)

Wertartem
Wertartem

Reputation: 237

from bs4 import BeautifulSoup
import requests
page = requests.get('https://transparency.entsoe.eu/transmission-domain/r2/dayAheadPrices/show?name=&defaultValue=false&viewType=TABLE&areaType=BZN&atch=false&dateTime.dateTime=01.10.2018+00:00%7CCET%7CDAY&biddingZone.values=CTY%7C10YAT-APG------L!BZN%7C10YAT-APG------L&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)')
soup = BeautifulSoup(page.text, 'html.parser')
price_hide = soup.find_all(['tr'])
for price in price_hide:
    print(''.join(price.get_text("|", strip=True).split()))

With the following output:

MTU|Day-aheadPrice
[EUR/MWh]
00:00-01:00|59.53
01:00-02:00|56.10
02:00-03:00|51.41
03:00-04:00|47.38
04:00-05:00|47.59
05:00-06:00|51.61
06:00-07:00|69.13
07:00-08:00|77.32
08:00-09:00|84.97
09:00-10:00|79.56
10:00-11:00|73.70
11:00-12:00|72.00
12:00-13:00|65.20
13:00-14:00|62.05
14:00-15:00|61.96
15:00-16:00|62.41
16:00-17:00|61.98
17:00-18:00|60.42
18:00-19:00|69.93
19:00-20:00|75.00

Upvotes: 0

Sruthi
Sruthi

Reputation: 3018

First find all the td tags and then in each of them extract the text value inside the span tag

timestamps=soup.find_all("td",class_="first")
prices=soup.find_all("td",class_="dv-value-cell")

for t,p in zip(timestamps,prices):
    print(t.text.strip()," ",p.span.text.strip())


00:00 - 01:00   59.53
01:00 - 02:00   56.10
02:00 - 03:00   51.41
03:00 - 04:00   47.38
04:00 - 05:00   47.59

Upvotes: 2

Mr.AK
Mr.AK

Reputation: 26

You need to use soup.find_all() instead of soup.find() and then apply further logic to extract the required results.

Upvotes: 0

Kipr
Kipr

Reputation: 1076

Is this what you are looking for ?

from bs4 import BeautifulSoup
import requests
page = requests.get('https://transparency.entsoe.eu/transmission-domain/r2/dayAheadPrices/show?name=&defaultValue=false&viewType=TABLE&areaType=BZN&atch=false&dateTime.dateTime=01.10.2018+00:00%7CCET%7CDAY&biddingZone.values=CTY%7C10YAT-APG------L!BZN%7C10YAT-APG------L&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)')
soup = BeautifulSoup(page.text, 'html.parser')
price_hide = soup.find_all(class_='dv-value-cell')
for price in price_hide:
    print(price.text.rstrip().lstrip())

With the following output:

59.53
56.10
51.41
47.38
47.59
51.61
69.13
77.32
...

Upvotes: 0

Related Questions