Reputation: 7989
I am on the lookout for two R functions that would perform the following string rearrangements: (1) place the parts following a ", " in a string at the start of a string, e.g.
name="2,6-Octadien-1-ol, 3,7-dimethyl-, (E)-"
should yield
"(E)-3,7-dimethyl-2,6-Octadien-1-ol"
(note that there could be any number of ", " in a string, or none at all, and that the parts after the ", " should be placed at the start of the string successively, starting from the end of the string. What would be the most efficient way of achieving this in R (without using loops etc)?
(2) place the parts between "<" and ">" at the start of a string and remove any ", ". E.g.
name="Pyrazine <2-acetyl-, 3-ethyl->"
should yield
"2-acetyl-3-ethyl-Pyrazine"
(this is a simpler gsub problem, right?) The part between the "<" and ">" could be in any place in the string though. E.g.
name="Cyclohexanol <4-tertbutyl-> acetate"
should yield
"4-tertbutyl-Cyclohexanol acetate"
Any thoughts would be welcome!
cheers, Tom
Upvotes: 0
Views: 173
Reputation: 81703
For the first problem:
name <- c("2,6-Octadien-1-ol, 3,7-dimethyl-, (E)-",
"2,6-Octadien-1-ol,3,7-dimethyl-,(E)-")
sapply(strsplit(name, "(?<!\\d), ?", perl = TRUE), function(x)
paste(rev(x), collapse = ""))
# [1] "(E)-3,7-dimethyl-2,6-Octadien-1-ol" "(E)-3,7-dimethyl-2,6-Octadien-1-ol"
For the second problem:
name <- c("Pyrazine <2-acetyl-, 3-ethyl->",
"Cyclohexanol <4-tertbutyl-> acetate")
inside <- gsub(", ", "", sub("^.*<(.+)>.*$", "\\1", name))
outside <- sub("^(.*) <.*>(.*)$" , "\\1\\2", name)
paste0(inside, outside)
# [1] "2-acetyl-3-ethyl-Pyrazine" "4-tertbutyl-Cyclohexanol acetate"
Upvotes: 1