Reputation: 959
I am creating RDF Linked Data using R. Right now I have URI's like:
test:Value_ONE%20OR%20TWO
I instead want to create IRIs using the proper encoding. Conversion of URIs to IRIs is described here:
https://www.w3.org/International/iri-edit/draft-duerst-iri.html#URItoIRI
Can someone guide me with example R code to convert a percent encoded URI an IRI?
Upvotes: 1
Views: 240
Reputation: 865
You'll have to play around with the logic but the below works for the first example in the link you sent. Luckily, the bulk of the transformations can be done in base R. I've added the tidyverse
just to suggest ways of doing this computationally.
Map is just the tidyverse
's version of the apply
family and iterates through a list or a vector. map_int/map_chr
can be replaced with sapply
and map/map2
can be replaced with lapply
. stringr
is your best friend whenever you want to do string manipulation (extraction and replacement) in R:
library(tidyverse)
testURI = 'http://www.example.org/D%C3%BCrst'
#testURI = 'test:Value_ONE%20OR%20TWO'
########################################
# extract any pattern that matches %\w\w
# "\w" is a regex representation for any character
# a "\" must be prepended to the regex in R
########################################
extractPerc <- testURI %>%
str_extract_all(regex('(%\\w{2})+')) %>%
unlist()
extractPercDecoded <- map_chr(extractPerc, URLdecode)
extractPercInt <- map_int(extractPercDecoded, utf8ToInt)
############################################
# Keep as a list so the Hex code isn't converted to it's
# character representation or it's numeric default
############################################
extractPercHex <- map(extractPercInt, as.hexmode)
#####################################################
# iterate over the string and replace the %s with the hexs
# There's definitely a better way to replace the %-html representation
# with the hex representation, but I can't quite figure it out
####################################################
newURI = testURI
map2(extractPerc, extractPercHex, function(x, y){
newURI <<- str_replace(newURI,
x,
str_c('&#x', y, ';'))
})
newURI
Upvotes: 1