Galina Polishchuk
Galina Polishchuk

Reputation: 602

R using gsub in a loop

I have a following vector of column names:

plot_variables <- c("Ser predicted (g/L)", "Ser initial (g/L)", "Ser experimental (g/L)", "Glu predicted (g/L)", "Glu initial (g/L)", "Glu experimental (g/L)", Pro predicted (g/L), ...)

And I have a glossary of those short names:

df_glossary <- data.frame(
  short = c("Cys", "Pro", "Phe", "Ser", "Glu", "Glc", ...),
  full = c("Cysteine", "Proline", "Phenylalanine", "Serine", "Glutamate", "Glucose", ...),
  stringsAsFactors = FALSE
)

I would like to match those two and have something like:

names_matching <- data.frame(
variable = c("Ser predicted (g/L)", "Ser initial (g/L)", "Ser experimental (g/L)", ...),
label = c("Serine predicted (g/L)", "Serine initial (g/L)", "Serine experimental (g/L)", ...)
)

Is there a more elegant way to do it than this:

pl<-unlist(plot_variables)

pl<-sapply(1:nrow(df_glossary) , function(x){
    pl<<- gsub(df_glossary$short[x], df_glossary$full[x],  pl, fixed = TRUE)
    })

pl <- pl[,nrow(df_glossary)] %>% data.frame()

names_matching <- cbind(plot_variables %>% data.frame, pl)

Upvotes: 0

Views: 464

Answers (2)

mysteRious
mysteRious

Reputation: 4294

I think what you're looking for is gsubfn in the gsubfn package. If you want to read the keys and values from another data frame, you'll have some wrangling to do, but in general here's how it works:

> library(gsubfn)
> gsubfn('[Ser|Glu|Pro]*', 
     list('Ser'='Serine','Glu'='Glutamate','Pro'='Proline'), plot_variables)
[1] "Serine predicted (g/L)"       "Serine initial (g/L)"        
[3] "Serine experimental (g/L)"    "Glutamate predicted (g/L)"   
[5] "Glutamate initial (g/L)"      "Glutamate experimental (g/L)"
[7] "Proline predicted (g/L)"     

Upvotes: 2

dalloliogm
dalloliogm

Reputation: 8940

I am not sure I understood the question, would this work?

df_glossary <- data.frame(
  shortnames = c("Cys", "Pro", "Phe", "Ser", "Glu", "Glc"),
  full = c("Cysteine", "Proline", "Phenylalanine", "Serine", "Glutamate", "Glucose"),
  stringsAsFactors = FALSE
)
plot_variables <- c("Ser predicted (g/L)", "Ser initial (g/L)", "Ser experimental (g/L)", "Glu predicted (g/L)", "Glu initial (g/L)", "Glu experimental (g/L)", "Pro predicted (g/L)")
suffixes = c("predicted (g/L)", "initial (g/L)", "experimental (g/L)")

df_glossary %>% rowwise %>% 
    do(data.frame(short=.$short, full=.$full, suffix=suffixes )) %>%
    mutate(label=paste(full, suffix))

short   full    suffix  label
Cys Cysteine    predicted (g/L) Cysteine predicted (g/L)
Cys Cysteine    initial (g/L)   Cysteine initial (g/L)
Cys Cysteine    experimental (g/L)  Cysteine experimental (g/L)
Pro Proline predicted (g/L) Proline predicted (g/L)
Pro Proline initial (g/L)   Proline initial (g/L)
Pro Proline experimental (g/L)  Proline experimental (g/L)
Phe Phenylalanine   predicted (g/L) Phenylalanine predicted (g/L)
Phe Phenylalanine   initial (g/L)   Phenylalanine initial (g/L)
Phe Phenylalanine   experimental (g/L)  Phenylalanine experimental (g/L)
Ser Serine  predicted (g/L) Serine predicted (g/L)
Ser Serine  initial (g/L)   Serine initial (g/L)
Ser Serine  experimental (g/L)  Serine experimental (g/L)
Glu Glutamate   predicted (g/L) Glutamate predicted (g/L)
Glu Glutamate   initial (g/L)   Glutamate initial (g/L)
Glu Glutamate   experimental (g/L)  Glutamate experimental (g/L)
Glc Glucose predicted (g/L) Glucose predicted (g/L)
Glc Glucose initial (g/L)   Glucose initial (g/L)
Glc Glucose experimental (g/L)  Glucose experimental (g/L)

Upvotes: 0

Related Questions