TobKel
TobKel

Reputation: 1453

R Scrape HTML table from Yahoo Finance

I want to scrape a table from Yahoo Finance and download it as a dataframe. Unfortunately I don't really know how to do it using the rvest-package.

Here is a first approach:

library(tidyverse)
library(rvest)

url<-"https://finance.yahoo.com/calendar/ipo?from=2021-02-21&to=2021-02-27&day=2021-02-23"

url %>%
  html() %>%
  html_nodes(xpath="table") %>%
  html_table()

As expected, the code does not work. Can someone help me?

I want to have the framed table as a dataframe:

enter image description here

Many thanks in advance!

Upvotes: 0

Views: 377

Answers (2)

Daniel D
Daniel D

Reputation: 119

Here is the simplest way of solving your problem and it keeps the headers too :)

library(tidyverse)
library(rvest)

url<-"https://finance.yahoo.com/calendar/ipo?from=2021-02-21&to=2021-02-27&day=2021-02-23"

# Scrape the data

df <- url %>%
  read_html() %>%
  html_nodes(xpath = '//*[@id="cal-res-table"]') %>% 
  as.character() %>% 
  XML::readHTMLTable()

# df is a list of two tables (as you can see from the website) - pick only the first list item

tbl <- as.data.frame(df[1])

# print your table
tbl
#>   NULL..Symbol.                                NULL.Company NULL.Exchange
#> 1         VELOU            Velocity Acquisition Corp. Units        Nasdaq
#> 2         FTAAU          FTAC Athena Acquisition Corp. Unit        Nasdaq
#> 3         CMIIU               CM Life Sciences II Inc. Unit        Nasdaq
#> 4                                            Metropress Ltd           LSE
#> 5      CTWO.P.V                        County Capital 2 Ltd          TSXV
#> 6         GSEVU              Gores Holdings VII, Inc. Units        Nasdaq
#> 7          NVOS Novo Integrated Sciences, Inc. Common Stock        Nasdaq
#> 8         SLAMU                             Slam Corp. Unit        Nasdaq
#>      NULL.Date NULL.Price.Range NULL.Price NULL.Currency NULL.Shares
#> 1 Feb 23, 2021    10.00 - 10.00          -           USD           -
#> 2 Feb 23, 2021                -          -           USD           -
#> 3 Feb 23, 2021    10.00 - 10.00          -           USD           -
#> 4 Feb 01, 2021                -          6           GBP    45452752
#> 5 Nov 19, 2020      0.08 - 0.08        0.1           CAD     6000000
#> 6 Feb 23, 2021                -          -           USD           -
#> 7 Feb 23, 2021                -          -           USD           -
#> 8 Feb 23, 2021    10.00 - 10.00          -           USD           -
#>   NULL.Actions
#> 1     Expected
#> 2     Expected
#> 3     Expected
#> 4       Priced
#> 5       Priced
#> 6     Expected
#> 7     Expected
#> 8     Expected

You might want to clean up those column names, though. :)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

Unfortunately, the table is not easily extractable using html_table. Here's a way to extract the individual values from the table and doing some post-processing to get the data in a dataframe.

library(rvest)

url<-"https://finance.yahoo.com/calendar/ipo?from=2021-02-21&to=2021-02-27&day=2021-02-23"

url %>%
  read_html() %>%
  html_nodes('table') %>%
  .[[1]] -> tab1
header <- tab1 %>% html_nodes('th') %>% html_text()

result <- tab1%>%
  html_nodes('tr.simpTblRow td') %>%
  html_text() %>%
  matrix(ncol = 9, byrow = TRUE) %>%
  as.data.frame()
names(result) <- header

result

#    Symbol                                     Company Exchange
#1    VELOU            Velocity Acquisition Corp. Units   Nasdaq
#2    FTAAU          FTAC Athena Acquisition Corp. Unit   Nasdaq
#3    CMIIU               CM Life Sciences II Inc. Unit   Nasdaq
#4                                       Metropress Ltd      LSE
#5 CTWO.P.V                        County Capital 2 Ltd     TSXV
#6    GSEVU              Gores Holdings VII, Inc. Units   Nasdaq
#7     NVOS Novo Integrated Sciences, Inc. Common Stock   Nasdaq
#8    SLAMU                             Slam Corp. Unit   Nasdaq

#          Date   Price Range Price Currency   Shares  Actions
#1 Feb 23, 2021 10.00 - 10.00     -      USD        - Expected
#2 Feb 23, 2021             -     -      USD        - Expected
#3 Feb 23, 2021 10.00 - 10.00     -      USD        - Expected
#4 Feb 01, 2021             -     6      GBP 45452752   Priced
#5 Nov 19, 2020   0.08 - 0.08   0.1      CAD  6000000   Priced
#6 Feb 23, 2021             -     -      USD        - Expected
#7 Feb 23, 2021             -     -      USD        - Expected
#8 Feb 23, 2021 10.00 - 10.00     -      USD        - Expected

Upvotes: 1

Related Questions