Reputation: 1007
I am trying to retrieve the quote "I understood at a very early age...spirit of the universe." and the author's name "Alice Walker" from the following Javascript code:
<div id="qpos_4_3" class="m-brick grid-item boxy bqQt" style="position: absolute; left: 0px; top: 33815px;">
<div class="">
<a href="/quotes/quotes/a/alicewalke625815.html?src=t_age" class="b-qt
qt_625815 oncl_q" title="view quote">I understood at a very early age that
in nature, I felt everything I should feel in church but never did.
Walking in the woods, I felt in touch with the universe and with the
spirit of the universe.
</a>
<a href="/quotes/authors/a/alice_walker.html" class="bq-aut qa_625815
oncl_a" title="view author">Alice Walker</a>
</div>
<div class="kw-box">
<a href="/quotes/topics/topic_nature.html" class="oncl_k" data-
idx="0">Nature</a>,
</div>
I have used chrome's developer toolbar to get the xpath. The following code is intended to extract the quote, but it outputs character(0)
. What am I doing wrong?
link <- "https://www.brainyquote.com/quotes/topics/topic_age.html"
quote <- read_html(link)
quote %>%
html_nodes(xpath = '//*[@id="qpos_4_3"]/div[1]/a[1]') %>%
html_attr('view quote')
Upvotes: 1
Views: 262
Reputation: 6264
You were nearly there with your attempt. Note that you could extend your XPath expression to include the title
you were trying to isolate with html_attr
but you really wanted xml_contents
. I've added magrittr
only for piping and readability, it is not otherwise required... and I have coerced the results to characters assuming you will use them as such further on.
get_contents <- function(link, id, title) {
require(xml2)
require(magrittr)
xpath <- paste0(".//div[@id='", id, "']//a[@title='", title, "']")
read_html(link) %>%
xml_find_first(xpath) %>%
xml_contents() %>%
as.character()
}
link <- "https://www.brainyquote.com/quotes/topics/topic_age.html"
id <- "qpos_1_10"
quote <- get_contents(link, id, "view quote")
# [1] "In our age there is no such thing as 'keeping out of politics.' All
# issues are political issues, and politics itself is a mass of lies,
# evasions, folly, hatred and schizophrenia."
author <- get_contents(link, id, "view author")
# [1] "George Orwell"
Upvotes: 2