Mudassar
Mudassar

Reputation: 95

R html_nodes() function giving error Unexpected character '$'

I am trying to extract financial data from Yahoo Finance. When I am running, it is giving error

"Error in tokenize(css) : Unexpected character '$' found at position 19"

urlYCashflow <- "https://au.finance.yahoo.com/quote/MSFT/cash-flow?p=MSFT"
webpageYCashflow <- read_html(urlYCashflow)
node1 <- webpageYCashflow %>%
      html_nodes('D(tbr).fi-row.Bgc($hoverBgColor):h') %>%
      html_text()    

Is there any way to avoid $ by replacing it in XML document or any other suggestion, please? I also tried xpath tag but every time result is character(0).

    node1 <- webpageYCashflow %>%
      html_nodes(xpath = '//*[@id="Col1-1-Financials-Proxy"]/section/div[3]/div[1]/div/div[2]/div[7]/div[2]/div[3]/div[1]/div[2]/span') %>%
      html_text()

Upvotes: 2

Views: 1064

Answers (1)

QHarr
QHarr

Reputation: 84465

Which value(s) are you after specifically? You could use the following: https://stackoverflow.com/a/58337027/6241235 to get all values.

Currently your css selector is syntactically incorrect especially the use of unescaped $ and :h which are ends with operator and implied pseudo selector respectively. When compiled this is how they will be interpreted. You are also missing the leading class selector. You can simply replace the multi-valued class with a single class name .fi-row to get the rows.

To match your xpath you can simply select for last row and then second column:

library(rvest)
library(magrittr)

page <- read_html('https://au.finance.yahoo.com/quote/MSFT/cash-flow?p=MSFT')
free_cash_flow <- tail(page%>%html_nodes('.fi-row'),1)%>%html_nodes('span')%>%`[[`(2)%>%html_text()

Upvotes: 2

Related Questions