niko
niko

Reputation: 5281

Extracting image source from string in R

I am trying to scrape image sources from different website. I used rvest to do that. The problem I encounter is that I have a vector string containing the source but I need to extract the source from it.
Here are the first few entries:

> string
{xml_nodeset (100)}
 [1] <td class="no-wrap currency-name" data-sort="Bitcoin">\n<img src="https://s2.coinmarketc ...
 [2] <td class="no-wrap currency-name" data-sort="Ethereum">\n<img src="https://s2.coinmarket ...
 [3] <td class="no-wrap currency-name" data-sort="Ripple">\n <img src="https://s2.coinmarketc ...

What I need is basically the part coming after src=", so for the first one "https://s2.coinmarketcap.com/static/img/coins/16x16/1.png" (the console doesn't show the full strings but this what appears after the dots ... and there comes more stuff after it as well).

Any help is appreciated as I am a bit stuck here.

Upvotes: 1

Views: 251

Answers (2)

niko
niko

Reputation: 5281

As pointed out in the comments, a regular expression should do it:

myhtml <- gsub('^.*https://\\s*|\\s*.png.*$', "", string)
myhtml <- paste0("https://", myhtml, ".png")

The first line will extract the part of the string contained between https:// and .png, and the second one will paste them back into your string in order to have a valid source, i.e. with https:// and .png at the end.

Upvotes: 1

Onyambu
Onyambu

Reputation: 79238

You can do:

library(rvest)
read_html("https://coinmarketcap.com/coins/")%>%
     html_nodes("td img")%>%html_attr("src")

  [1] "https://s2.coinmarketcap.com/static/img/coins/16x16/1.png"            
  [2] "https://s2.coinmarketcap.com/generated/sparklines/web/7d/usd/1.png"   
  [3] "https://s2.coinmarketcap.com/static/img/coins/16x16/1027.png"         
  [4] "https://s2.coinmarketcap.com/generated/sparklines/web/7d/usd/1027.png"
  [5] "https://s2.coinmarketcap.com/static/img/coins/16x16/52.png"           
  [6] "https://s2.coinmarketcap.com/generated/sparklines/web/7d/usd/52.png"  
  [7] "https://s2.coinmarketcap.com/static/img/coins/16x16/1831.png"         
  [8] "https://s2.coinmarketcap.com/generated/sparklines/web/7d/usd/1831.png"
    :
    :
    :
    :

Upvotes: 2

Related Questions