Reputation: 85
I'm trying to get a specific number in webpages of: https://ideas.repec.org/. More specifically, I'm looking for the number of search results like this:IDEAS' search results
However, when I'm applying the following code, I get an empty string:
library(rvest)
x <- GET("https://ideas.repec.org/cgi-bin/htsearch?form=extended&wm=wrd&dt=range&ul=&q=labor&cmd=Search%21&wf=4BFF&s=R&db=01%2F01%2F1950&de=31%2F12%2F1950")
webpage <- read_html(x)
hits_html <- html_nodes(webpage, xpath = '//*[@id="content-block"]/p')
hits <- html_text(hits_html)
hits
[1] ""
Upvotes: 1
Views: 126
Reputation: 84465
You could regex it out from the appropriate node. This does assume a constant before and after string and case. You could make also case insensitive with (?i)found\\s+(\\d+)\\s+results
.
library(rvest)
library(stringr)
page = read_html("https://ideas.repec.org/cgi-bin/htsearch?form=extended&wm=wrd&dt=range&ul=&q=labor&cmd=Search%21&wf=4BFF&s=R&db=01%2F01%2F1950&de=31%2F12%2F1950")
r = page %>% html_node("#content-block") %>% html_text() %>%toString()
x <- str_match_all(r,'Found\\s+(\\d+)\\s+results')
print(x[[1]][,2])
Upvotes: 1