Reputation: 480
Is there a way to convert HTML back to character in R. For example
v1 <- "This is the link <a href=https://google.com>Click here</a> also there is another link, <a href=https://yahoo.com>Click here</a>"
Expected output
"This is the link https://google.com also there is another link, https://yahoo.com"
Upvotes: 0
Views: 54
Reputation: 388797
You can use gsub
to remove the text that you don't want.
gsub("<a href=(.*?)>.*?</a>", "\\1 ",v1)
#[1] "This is the link https://google.com also there is another link, https://yahoo.com "
This removes everything except the link between every <a>..</a>
tags.
Upvotes: 1