psysky
psysky

Reputation: 3195

convert hyperlinks from cells of HTML - tables to text line in R

I need that hyperlinks from cells of HTML - tables converted to text lines.

I. E here hyperlinks

Cruise_Reference <- c("https://www.nodc.noaa.gov/OC5/SELECT/allcruises/CA020377.html",
                      "https://www.nodc.noaa.gov/OC5/SELECT/allcruises/US035632.html") 

Accession <- c("https://www.nodc.noaa.gov/OC5/SELECT/accessions/readme_013183..",
               "https://www.nodc.noaa.gov/OC5/SELECT/accessions/readme_011637..")

So expected output enter image description here

How to do it?

Upvotes: 0

Views: 271

Answers (2)

clemens
clemens

Reputation: 6813

You can use tableHTML to create such a table:

In order to display the URLs the way you want, you first need to extract the part you want to display:

Cruise_Reference <- c("https://www.nodc.noaa.gov/OC5/SELECT/allcruises/CA020377.html",
                      "https://www.nodc.noaa.gov/OC5/SELECT/allcruises/US035632.html")


Cruise_Reference_url_text <- sub('\\.html', '', sub('.*\\/', '', Cruise_Reference))

This returns:

[1] "CA020377" "US035632"

Next, you need to add an <a> tag around and provide the URL as href:

paste0('<a href="', Cruise_Reference, '">', 
                           Cruise_Reference_url_text,
                           '</a>')

This produces this HTML string:

[1] "<a href=\"https://www.nodc.noaa.gov/OC5/SELECT/allcruises/CA020377.html\">CA020377</a>"
[2] "<a href=\"https://www.nodc.noaa.gov/OC5/SELECT/allcruises/US035632.html\">US035632</a>"

The same thing applies to the Accession# column.

If you start with this data:

library(dplyr)

table_data <- tibble("#" = c(1, 28),
                     "Cruise Reference" = c("https://www.nodc.noaa.gov/OC5/SELECT/allcruises/CA020377.html",
                                            "https://www.nodc.noaa.gov/OC5/SELECT/allcruises/US035632.html"),
                     "Institute" = c(9421, 9421),
                     "#Cats" = c(435, 190), 
                     "Accession#" = c("https://www.nodc.noaa.gov/OC5/SELECT/accessions/readme_013183..",
                                      "https://www.nodc.noaa.gov/OC5/SELECT/accessions/readme_011637.."),
                     "Start Date" = c("8/ 1/2012", "1/ 9/2014"),
                     "End Date" = c("3/17/2013", "4/27/2014"),
                     "Orig. Cruise ID" = c("", "Q9900653"))

You can use the code within mutate() to change the 2 columns:

table_data <- table_data %>% 
  mutate(`Cruise Reference` = paste0('<a href="', `Cruise Reference`, '">', 
                                     sub('\\.html', '', sub('.*\\/', '', `Cruise Reference`)),
                                     '</a>')) %>% 
  mutate(`Accession#` = paste0('<a href="', `Accession#`, '">', 
                               sub('\\..', '',sub('.*\\_|', '', `Accession#`)),
                               '</a>'))

The last step is to produce the tableHTML:

library(tableHTML)

table_data %>% 
  tableHTML(rownames = FALSE,
            escape = FALSE,
            widths = c(50, rep(100, 7))) %>% 
  add_css_header(css = list("background-color",
                            "lightgray"),
                 headers = 1:8) %>% 
  add_css_row(css = list("height", "50px"))

The result is:

result

Upvotes: 2

Alberto Burgos Plaza
Alberto Burgos Plaza

Reputation: 11

You can use gsub with some kind of pattern like this:

gsub(x = "https://www.nodc.noaa.gov/OC5/SELECT/allcruises/CA020377.html", pattern = "^.*/(.*).html", replacement = "\\1")

Upvotes: 1

Related Questions