Reputation: 29
I am new to R and trying to scrape the map data from the following webpage: https://www.svk.se/en/national-grid/the-control-room/. The map is called "The flow of electricity". I am trying to scrape the capacity numbers (in blue) and the corresponding countries. So far I could not find a solution on how to find the countries' names in the HTML code and consequently scrape them.
Here is an example of data I need:
Would you have any idea?
Thanks a lot in advance.
Upvotes: 1
Views: 144
Reputation: 84465
The flow data comes from an API call so you need to make an additional xhr (to an url you can find in the network tab via dev tools ) to get this data. You don't need to specify values for the timestamp (Ticks
) and random (rnd
) params in the querystring.
library(jsonlite)
data <- jsonlite::read_json('https://www.svk.se/Proxy/Proxy/?a=http://driftsdata.statnett.no/restapi/PhysicalFlowMap/GetFlow?Ticks=&rnd=')
As a dataframe:
library(jsonlite)
library (plyr)
data <- jsonlite::read_json('https://www.svk.se/Proxy/Proxy/?a=http://driftsdata.statnett.no/restapi/PhysicalFlowMap/GetFlow?Ticks=&rnd=')
df <- ldply (data, data.frame)
Upvotes: 0
Reputation: 388807
The data is not in the table, hence we need to extract all the information individually.
Here is a way to do this using rvest
.
library(rvest)
url <-'https://www.svk.se/en/national-grid/the-control-room/'
webpage <- url %>% read_html() %>%html_nodes('div.island')
tibble::tibble(country = webpage %>% html_nodes('span.country') %>% html_text(),
watt = webpage %>% html_nodes('span.watt') %>% html_text() %>%
gsub('\\s', '', .) %>% as.numeric(),
unit = webpage %>% html_nodes('span.unit') %>% html_text())
# country watt unit
# <chr> <dbl> <chr>
#1 SWEDEN 3761 MW
#2 DENMARK 201 MW
#3 NORWAY 2296 MW
#4 FINLAND 1311 MW
#5 ESTONIA 632 MW
#6 LATVIA 177 MW
#7 LITHUANIA 1071 MW
Upvotes: 2