Reputation: 73
I have a string list here,
List <- c('C8 H12 O1 Na1', 'C15 H20 O7 Na1', 'C18 H24 O6', 'C24 H32 O9 Na1', 'C26 H38 O5 Na1')
And I would like to change it to
Listnew <- c('C8 H12 O1', 'C15 H20 O7', 'C18 H23 O6', 'C24 H32 O9', 'C26 H38 O5')
Where any string containing Na
had it removed and any string that did not have Na
had the H
in the string reduced by 1. In this case, from List
, 'C18 H24 O6'
was changed to 'C18 H23 O6'
. This list is contained in a matrix. I know how to changed strings based on one condition
I think that I need to create a True/False column on whether or not Na exists within the column's string first, then use that to either subtract '1' from the H string or remove the Na. However, I have tried to look for similar questions but I could not find an answer that worked for me.
Upvotes: 1
Views: 79
Reputation: 887048
With sub
we can remove the Na\\d+
at the end while subtracting 1 with gsubfn
library(gsubfn)
new <- sub("\\sNa\\d+$", "", List)
i1 <- grep("\\bNa\\d+$", List, invert = TRUE)
new[i1] <- gsubfn("H\\d+", ~ paste0(substring(x, 1, 1),
as.numeric(substring(x, 2))-1), new[i1])
new
#[1] "C8 H12 O1" "C15 H20 O7" "C18 H23 O6" "C24 H32 O9" "C26 H38 O5"
or as @G. Grothendieck commented
new[i1] <- gsubfn("(H)(\\d+)", ~ paste0(x, as.numeric(y)-1), new[i1])
Upvotes: 2
Reputation: 11128
Another way could be:
nums <- as.numeric(stringr::str_extract(List, '(?<=H)(\\d+)' ))-1
## subtracting every integer attached with H by 1
no_na <- grep('\\bNa\\d+\\b',List, invert = TRUE)
## picking the spot where Na is missing
replacement = unlist(lapply(no_na,function(x)gsub('(H)(\\d+)',paste0('\\1',nums[x]), List[x], perl = TRUE)))
## replacing only that subtracted value which has no Na
List[no_na] <- replacement
## finally replacing using gsub Na followed by digit by ''
fout <- trimws(gsub('\\bNa\\d+\\b', '', List))
Output:
> fout [1] "C8 H12 O1" "C15 H20 O7" "C18 H23 O6" "C24 H32 O9" [5] "C26 H38 O5"
Upvotes: 1
Reputation: 16832
With some stringr
functions, test for the presence of "\\bNa\\d+"
, extract the number after H, and decrease it.
library(stringr)
List %>%
ifelse(str_detect(., "\\bNa\\d+"),
.,
str_replace(., "(?<=\\bH)\\d+", as.character(as.integer(str_extract(., "(?<=\\bH)(\\d+)")) - 1L))) %>%
str_remove_all("\\bNa\\d+") %>%
trimws()
#> [1] "C8 H12 O1" "C15 H20 O7" "C18 H23 O6" "C24 H32 O9" "C26 H38 O5"
For a bit more legibility, pull the nested stuff out into a function.
decrease_h <- function(x) {
if (!str_detect(x, "\\bNa\\d+")) {
n <- as.integer(str_extract(x, "(?<=\\bH)(\\d+)")) - 1L
str_replace(x, "(?<=\\bH)\\d+", as.character(n))
} else {
x
}
}
List %>%
purrr::map_chr(decrease_h) %>% # or use sapply
str_remove_all("\\bNa\\d+") %>%
trimws()
Upvotes: 0
Reputation: 7592
Here's an implementation of the idea I suggested in the comment - we break down the strings, and then either remove the Na1
or reduce the H
by 1. Then we paste
it all back together and return a vector.
sapply(strsplit(List," "), function(x) {if (any(grepl("Na",x))) {x[grepl("Na",x)]<-""} else
{x[grepl("H",x)]<-paste0("H",readr::parse_number(x[grepl("H",x)])-1)}
return(trimws(paste(x,collapse=" ")))
})
[1] "C8 H12 O1" "C15 H20 O7" "C18 H23 O6" "C24 H32 O9" "C26 H38 O5"
Upvotes: 0