Ragstock
Ragstock

Reputation: 73

Changing a list of strings based on certain conditions

I have a string list here,

List <- c('C8 H12 O1 Na1', 'C15 H20 O7 Na1', 'C18 H24 O6', 'C24 H32 O9 Na1', 'C26 H38 O5 Na1')

And I would like to change it to

Listnew <- c('C8 H12 O1', 'C15 H20 O7', 'C18 H23 O6', 'C24 H32 O9', 'C26 H38 O5')

Where any string containing Na had it removed and any string that did not have Na had the H in the string reduced by 1. In this case, from List, 'C18 H24 O6' was changed to 'C18 H23 O6'. This list is contained in a matrix. I know how to changed strings based on one condition

I think that I need to create a True/False column on whether or not Na exists within the column's string first, then use that to either subtract '1' from the H string or remove the Na. However, I have tried to look for similar questions but I could not find an answer that worked for me.

Upvotes: 1

Views: 79

Answers (4)

akrun
akrun

Reputation: 887048

With sub we can remove the Na\\d+ at the end while subtracting 1 with gsubfn

library(gsubfn)
new <- sub("\\sNa\\d+$", "", List)
i1 <- grep("\\bNa\\d+$", List, invert = TRUE)
new[i1] <- gsubfn("H\\d+", ~ paste0(substring(x,  1, 1),
    as.numeric(substring(x, 2))-1), new[i1])
new
#[1] "C8 H12 O1"  "C15 H20 O7" "C18 H23 O6" "C24 H32 O9" "C26 H38 O5"

or as @G. Grothendieck commented

new[i1] <- gsubfn("(H)(\\d+)", ~ paste0(x, as.numeric(y)-1), new[i1])     

Upvotes: 2

PKumar
PKumar

Reputation: 11128

Another way could be:

nums <- as.numeric(stringr::str_extract(List, '(?<=H)(\\d+)' ))-1
## subtracting every integer attached with H by 1
no_na <- grep('\\bNa\\d+\\b',List, invert = TRUE)
## picking the spot where Na is missing
replacement = unlist(lapply(no_na,function(x)gsub('(H)(\\d+)',paste0('\\1',nums[x]), List[x], perl = TRUE)))   
## replacing only that subtracted value which has no Na
List[no_na] <- replacement
## finally replacing using gsub Na followed by digit by ''
fout <- trimws(gsub('\\bNa\\d+\\b', '', List))

Output:

> fout
[1] "C8 H12 O1"  "C15 H20 O7" "C18 H23 O6" "C24 H32 O9"
[5] "C26 H38 O5"

Upvotes: 1

camille
camille

Reputation: 16832

With some stringr functions, test for the presence of "\\bNa\\d+", extract the number after H, and decrease it.

library(stringr)

List %>%
  ifelse(str_detect(., "\\bNa\\d+"), 
         .,
         str_replace(., "(?<=\\bH)\\d+", as.character(as.integer(str_extract(., "(?<=\\bH)(\\d+)")) - 1L))) %>%
  str_remove_all("\\bNa\\d+") %>%
  trimws()
#> [1] "C8 H12 O1"  "C15 H20 O7" "C18 H23 O6" "C24 H32 O9" "C26 H38 O5"

For a bit more legibility, pull the nested stuff out into a function.

decrease_h <- function(x) {
  if (!str_detect(x, "\\bNa\\d+")) {
    n <- as.integer(str_extract(x, "(?<=\\bH)(\\d+)")) - 1L
    str_replace(x, "(?<=\\bH)\\d+", as.character(n))
  } else {
    x
  }
}

List %>%
  purrr::map_chr(decrease_h) %>% # or use sapply
  str_remove_all("\\bNa\\d+") %>%
  trimws()

Upvotes: 0

iod
iod

Reputation: 7592

Here's an implementation of the idea I suggested in the comment - we break down the strings, and then either remove the Na1 or reduce the H by 1. Then we paste it all back together and return a vector.

sapply(strsplit(List," "), function(x) {if (any(grepl("Na",x))) {x[grepl("Na",x)]<-""} else 
                                {x[grepl("H",x)]<-paste0("H",readr::parse_number(x[grepl("H",x)])-1)}
                             return(trimws(paste(x,collapse=" ")))
                            })

[1] "C8 H12 O1"  "C15 H20 O7" "C18 H23 O6" "C24 H32 O9" "C26 H38 O5"

Upvotes: 0

Related Questions