Wet Feet
Wet Feet

Reputation: 4555

R: gsub, pattern = vector and replacement = vector

As the title states, I am trying to use gsub where I use a vector for the "pattern" and "replacement". Currently, I have a code that looks like this:

  names(x1) <- gsub("2110027599", "Inv1", names(x1)) #x1 is a data frame
  names(x1) <- gsub("2110025622", "Inv2", names(x1))
  names(x1) <- gsub("2110028045", "Inv3", names(x1))
  names(x1) <- gsub("2110034716", "Inv4", names(x1))
  names(x1) <- gsub("2110069349", "Inv5", names(x1))
  names(x1) <- gsub("2110023264", "Inv6", names(x1))

What I hope to do is something like this:

  a <- c("2110027599","2110025622","2110028045","2110034716", "2110069349", "2110023264")
  b <- c("Inv1","Inv2","Inv3","Inv4","Inv5","Inv6")
  names(x1) <- gsub(a,b,names(x1))

I'm guessing there is an apply function somewhere that can do this, but I am not very sure which one to use!

EDIT: names(x1) looks like this (There are many more columns, but I'm leaving them out):

> names(x1)
  [1] "2110023264A.Ms.Amp"        "2110023264A.Ms.Vol"        "2110023264A.Ms.Watt"       "2110023264A1.Ms.Amp"      
  [5] "2110023264A2.Ms.Amp"       "2110023264A3.Ms.Amp"       "2110023264A4.Ms.Amp"       "2110023264A5.Ms.Amp"      
  [9] "2110023264B.Ms.Amp"        "2110023264B.Ms.Vol"        "2110023264B.Ms.Watt"       "2110023264B1.Ms.Amp"      
 [13] "2110023264Error"           "2110023264E-Total"         "2110023264GridMs.Hz"       "2110023264GridMs.PhV.phsA"
 [17] "2110023264GridMs.PhV.phsB" "2110023264GridMs.PhV.phsC" "2110023264GridMs.TotPFPrc" "2110023264Inv.TmpLimStt"  
 [21] "2110023264InvCtl.Stt"      "2110023264Mode"            "2110023264Mt.TotOpTmh"     "2110023264Mt.TotTmh"      
 [25] "2110023264Op.EvtCntUsr"    "2110023264Op.EvtNo"        "2110023264Op.GriSwStt"     "2110023264Op.TmsRmg"      
 [29] "2110023264Pac"             "2110023264PlntCtl.Stt"     "2110023264Serial Number"   "2110025622A.Ms.Amp"       
 [33] "2110025622A.Ms.Vol"        "2110025622A.Ms.Watt"       "2110025622A1.Ms.Amp"       "2110025622A2.Ms.Amp"      
 [37] "2110025622A3.Ms.Amp"       "2110025622A4.Ms.Amp"       "2110025622A5.Ms.Amp"       "2110025622B.Ms.Amp"       
 [41] "2110025622B.Ms.Vol"        "2110025622B.Ms.Watt"       "2110025622B1.Ms.Amp"       "2110025622Error"          
 [45] "2110025622E-Total"         "2110025622GridMs.Hz"       "2110025622GridMs.PhV.phsA" "2110025622GridMs.PhV.phsB"

What I hope to get is this:

> names(x1)
  [1] "Inv6A.Ms.Amp"        "Inv6A.Ms.Vol"        "Inv6A.Ms.Watt"       "Inv6A1.Ms.Amp"       "Inv6A2.Ms.Amp"      
  [6] "Inv6A3.Ms.Amp"       "Inv6A4.Ms.Amp"       "Inv6A5.Ms.Amp"       "Inv6B.Ms.Amp"        "Inv6B.Ms.Vol"       
 [11] "Inv6B.Ms.Watt"       "Inv6B1.Ms.Amp"       "Inv6Error"           "Inv6E-Total"         "Inv6GridMs.Hz"      
 [16] "Inv6GridMs.PhV.phsA" "Inv6GridMs.PhV.phsB" "Inv6GridMs.PhV.phsC" "Inv6GridMs.TotPFPrc" "Inv6Inv.TmpLimStt"  
 [21] "Inv6InvCtl.Stt"      "Inv6Mode"            "Inv6Mt.TotOpTmh"     "Inv6Mt.TotTmh"       "Inv6Op.EvtCntUsr"   
 [26] "Inv6Op.EvtNo"        "Inv6Op.GriSwStt"     "Inv6Op.TmsRmg"       "Inv6Pac"             "Inv6PlntCtl.Stt"    
 [31] "Inv6Serial Number"   "Inv2A.Ms.Amp"        "Inv2A.Ms.Vol"        "Inv2A.Ms.Watt"       "Inv2A1.Ms.Amp"      
 [36] "Inv2A2.Ms.Amp"       "Inv2A3.Ms.Amp"       "Inv2A4.Ms.Amp"       "Inv2A5.Ms.Amp"       "Inv2B.Ms.Amp"       
 [41] "Inv2B.Ms.Vol"        "Inv2B.Ms.Watt"       "Inv2B1.Ms.Amp"       "Inv2Error"           "Inv2E-Total"        
 [46] "Inv2GridMs.Hz"       "Inv2GridMs.PhV.phsA" "Inv2GridMs.PhV.phsB" 

Upvotes: 52

Views: 35250

Answers (6)

JWilliman
JWilliman

Reputation: 3883

From stringr documentation of str_replace_all, "If you want to apply multiple patterns and replacements to the same string, pass a named version to pattern."

Thus using a, b, and names(x1) from above

stringr::str_replace_all(names(x1), setNames(b, a))

EDIT

stringr::str_replace_all calls stringi::stri_replace_all_regex, which can be used directly and is quite a bit quicker.

x <- names(x1)
pattern <- a
replace <- b

microbenchmark::microbenchmark(
  str  = stringr::str_replace_all(x, setNames(replace, pattern)),
  stri = stringi::stri_replace_all_regex(x, pattern, replace, vectorize_all = FALSE)
  )

Unit: microseconds
 expr    min      lq     mean  median   uq    max neval cld
  str 1022.1 1070.45 1286.547 1175.55 1309 2526.8   100   b
 stri  145.2  150.45  190.124  160.55  178  457.9   100  a 

Upvotes: 34

Jenna Allen
Jenna Allen

Reputation: 534

I needed to do something similar but had to use base R. As long as your vectors are the same length, I think this will work

for (i in seq_along(a)){
  names(x1) <- gsub(a[i], b[i], names(x1))
} 

Upvotes: 5

Tyler Rinker
Tyler Rinker

Reputation: 110062

Lot's of solutions already, here are one more:

The qdap package:

library(qdap)
names(x1) <- mgsub(a,b,names(x1))

Upvotes: 33

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

New Answer

If we can make another assumption, the following should work. The assumption this time is that you are really interested in substituting the first 10 characters from each value in names(x1).

Here, I've stored names(x1) as a character vector named "X1". The solution essentially uses substr to separate the values in X1 into 2 parts, match to figure out the correct replacement option, and paste to put everything back together.

a <- c("2110027599", "2110025622", "2110028045",
       "2110034716", "2110069349", "2110023264")
b <- c("Inv1","Inv2","Inv3","Inv4","Inv5","Inv6")

X1pre <- substr(X1, 1, 10)
X1post <- substr(X1, 11, max(nchar(X1)))

paste0(b[match(X1pre, a)], X1post)
#  [1] "Inv6A.Ms.Amp"        "Inv6A.Ms.Vol"        "Inv6A.Ms.Watt"      
#  [4] "Inv6A1.Ms.Amp"       "Inv6A2.Ms.Amp"       "Inv6A3.Ms.Amp"      
#  [7] "Inv6A4.Ms.Amp"       "Inv6A5.Ms.Amp"       "Inv6B.Ms.Amp"       
# [10] "Inv6B.Ms.Vol"        "Inv6B.Ms.Watt"       "Inv6B1.Ms.Amp"      
# [13] "Inv6Error"           "Inv6E-Total"         "Inv6GridMs.Hz"      
# [16] "Inv6GridMs.PhV.phsA" "Inv6GridMs.PhV.phsB" "Inv6GridMs.PhV.phsC"
# [19] "Inv6GridMs.TotPFPrc" "Inv6Inv.TmpLimStt"   "Inv6InvCtl.Stt"     
# [22] "Inv6Mode"            "Inv6Mt.TotOpTmh"     "Inv6Mt.TotTmh"      
# [25] "Inv6Op.EvtCntUsr"    "Inv6Op.EvtNo"        "Inv6Op.GriSwStt"    
# [28] "Inv6Op.TmsRmg"       "Inv6Pac"             "Inv6PlntCtl.Stt"    
# [31] "Inv6Serial Number"   "Inv2A.Ms.Amp"        "Inv2A.Ms.Vol"       
# [34] "Inv2A.Ms.Watt"       "Inv2A1.Ms.Amp"       "Inv2A2.Ms.Amp"      
# [37] "Inv2A3.Ms.Amp"       "Inv2A4.Ms.Amp"       "Inv2A5.Ms.Amp"      
# [40] "Inv2B.Ms.Amp"        "Inv2B.Ms.Vol"        "Inv2B.Ms.Watt"      
# [43] "Inv2B1.Ms.Amp"       "Inv2Error"           "Inv2E-Total"        
# [46] "Inv2GridMs.Hz"       "Inv2GridMs.PhV.phsA" "Inv2GridMs.PhV.phsB"

Old Answer

If we can assume that names(x1) is in the same order as the pattern and replacement and that it is basically a one-for-one replacement, you might be able to get away with just sapply.

Here's an example of that particular situation:

Imagine "names(x)" looks something like this:

X1 <- paste0("A2", a, sequence(length(a)))
X1
# [1] "A221100275991" "A221100256222" "A221100280453" 
# [4] "A221100347164" "A221100693495" "A221100232646"

Here's our pattern and replacement vectors:

a <- c("2110027599", "2110025622", "2110028045", 
       "2110034716", "2110069349", "2110023264")
b <- c("Inv1","Inv2","Inv3","Inv4","Inv5","Inv6")

This is how we might use sapply if these assumptions are valid.

sapply(seq_along(a), function(x) gsub(a[x], b[x], X1[x]))
# [1] "A2Inv11" "A2Inv22" "A2Inv33" "A2Inv44" "A2Inv55" "A2Inv66"

Upvotes: 11

Richie Cotton
Richie Cotton

Reputation: 121177

Try mapply.

names(x1) <- mapply(gsub, a, b, names(x1), USE.NAMES = FALSE)

Or, even easier, str_replace from stringr.

library(stringr)
names(x1) <- str_replace(names(x1), a, b)

Upvotes: 3

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 60000

Somehow names<- and match seems much more appropriate here...

names( x1 ) <- b[ match( names( x1 ) , a ) ]

But I am making the assumption that the elements of vector a are the actual names of your data.frame.

If a really is a pattern found within each of the names of x1 then this grepl approach with names<- could be useful...

new <- sapply( a , grepl , x = names( x1 ) )
names( x1 ) <- b[ apply( new , 1 , which.max ) ]

Upvotes: 2

Related Questions