Reputation: 423
I would like to know how can I keep only the text between <> after running rvest on a specific attribute and website URL. This is the character set I get on the output
{xml_nodeset (11)}
[1] <td id="open">1.1041</td>
[2] <td id="open">1.1043</td>
[3] <td id="open">1.1049</td>
[4] <td id="open">1.1043</td>
[5] <td class="right" id="open">47.617</td>
[6] <td class="left" id="open">MA</td>
Ideally I want to isolate the contained text and get this
[1] 1.1041
[2] 1.1043
[3] 1.1049
[4] 1.1043
[5] 47.617
[6] MA
but so far by using the html_text function I get a concatenated string with "" between values which is not what I want
[1] "1.1041" "1.1043" "1.1049" "1.1043" "47.617" "MA"
Upvotes: 1
Views: 149
Reputation: 1751
Everything is being coerced to string format because of the last value MA
. That's why you get quotes around the numbers.
You can convert everything to numeric, but the last value would be coerced to NA
.
q <- c("1.1041", "1.1043", "1.1049", "1.1043", "47.617", "MA")
as.numeric(q)
# The output of the previous command is:
[1] 1.1041 1.1043 1.1049 1.1043 47.6170 NA
Warning message:
NAs introduced by coercion
So you have to decide what format you want your data in.
Upvotes: 1