Scraping historical data from coinmarketcap

Question

I'm generally not struggling with scraping tables from the web but for some reason when i try to scrape the historical data from the following page I don't manage to select the wanted table.

This is the link and my code

library(tidyverse)
library(rvest)

url <-read_html("https://coinmarketcap.com/currencies/bitcoin/historical-data/")
  
table <- url %>% 
  html_table() %>% .[[1]] %>% as.data.frame()

thanks

Bas · Accepted Answer

The data is well hidden in a script element on the page. It is loaded into a table dynamically via JavaScript, which is why you can't find it.

The following extracts the data from that script element (with ID __NEXT_DATA__).

library(tidyverse)
library(rvest)

url <-read_html("https://coinmarketcap.com/currencies/bitcoin/historical-data/")

table <- url %>% 
  html_node("#__NEXT_DATA__") %>%
  html_text() %>%
  jsonlite::fromJSON()

table$props$initialState$cryptocurrency$ohlcvHistorical[[1]]$quotes

which gives

                  time_open               time_close                time_high                 time_low quote.USD.open quote.USD.high
1  2020-10-10T00:00:00.000Z 2020-10-10T23:59:59.999Z 2020-10-10T03:16:44.000Z 2020-10-10T00:01:41.000Z       11059.14       11442.21
2  2020-10-11T00:00:00.000Z 2020-10-11T23:59:59.999Z 2020-10-11T15:31:43.000Z 2020-10-11T00:52:06.000Z       11296.08       11428.81
...

Scraping historical data from coinmarketcap

Answers (1)

Related Questions