Krantz
Krantz

Reputation: 1493

How to use mutate to create multiple variables with regex

I want to create new variables using paste0 by performing operation on other variables also using paste0. My approach fails as it only pastes the names of the new variables as observations, without generating the actual values. Thanks in advance for any help.

My data:

tempDF <- structure(list(d1 = c("A", "B", "C", "A", "C"), d2 = c(40L, 50L, 20L, 50L, 20L), 
        d3 = c(20L, 40L, 50L, 40L, 50L), d4 = c(60L, 30L, 30L,60L, 30L), p_A = c(1L, 
        3L, 2L, 3L, 2L), p_B = c(3L, 4L, 3L, 3L, 4L), p_C = c(2L, 1L, 1L,2L, 1L), p4 = c(5L, 
        5L, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA, -5L))

tempDF$d1 <- stats::relevel(as.factor(tempDF$d1), ref = "A")      
lLevels<-levels(tempDF$d1)


View(tempDF)

My attempt

func<-function(tempDF, lLevels){

for(aLevelNum in seq_along(lLevels)){

tempDF_new<-tempDF%>%
  mutate(paste0("newp_", lLevels[aLevelNum], "_vs_", lLevels[1])=d2*paste0("p", "_", lLevels[aLevelNum])))%>%
  dplyr::select(-contains(paste0("newp_", lLevels[1], "_vs_", lLevels[1])))%>%
        as.data.frame(.)



}

return(tempDF_new)      

}

tempDF_new<-func(tempDF, lLevels) %>%
        as.data.frame(.)

View(tempDF_new)

Expected output

tempDF_new
  d1 d2 d3 d4 p_A p_B p_C p4 newp_B_vs_A newp_C_vs_A
1  A 40 20 60   1   3   2  5 120         80
2  B 50 40 30   3   4   1  5 200         50
3  C 20 50 30   2   3   1  4 60          20
4  A 50 40 60   3   3   2  5 150         100
5  C 20 50 30   2   4   1  4 80          20  

Upvotes: 1

Views: 357

Answers (2)

rferrisx
rferrisx

Reputation: 1728

This had a data.table tag. But features tidyverse answers. In rdata.table:

DT[, newvar:= paste0(var1/var2)] would create a character field from the original numeric op.

DT[, newvar:= var1/var2 ] would create a numeric field if var1 and var2 were integer or numeric.

help(set) walks you through what is functionally the insert operator (':=') This operator is optimized, fast, flexible.

Upvotes: 1

A. Suliman
A. Suliman

Reputation: 13125

We can use gsub to get the desired names

 library(dplyr)
 tempDF %>% 
        mutate_at(vars('p_B','p_C'), list(newp = ~d2*.)) %>% 
        #create two groups then flip them, put _ in the middle and add _vs_A at the end 
        #For groups in gsub see e.g 
        #https://stackoverflow.com/questions/35463256/r-gsub-inserting-whitespaces-between-capture-groups
        rename_at(vars(ends_with('newp')), ~gsub('p_(.*)_(newp)','\\2_\\1_vs_A',.))

 d1 d2 d3 d4 p_A p_B p_C p4 newp_B_vs_A newp_C_vs_A
1  A 40 20 60   1   3   2  5         120          80
2  B 50 40 30   3   4   1  5         200          50
3  C 20 50 30   2   3   1  4          60          20
4  A 50 40 60   3   3   2  5         150         100
5  C 20 50 30   2   4   1  4          80          20

Update

#Replace map_dfc with map if you need a list 
library(purrr)
res <- map2_dfc(unique(tempDF$d1), c('d2','d3','d4'), function(x,y) {
  #In the 1st iter it will be B|C, the 2nd C|A
  conds <- paste(setdiff(unique(tempDF$d1), x), collapse = "|")

  #y it will be d1, d2, d3
  #x it will be A, B, C
  tmp <- select(tempDF, y, matches(conds)) %>% 
    mutate_at(vars(matches(conds)), list(newp=~!!sym(y)*.))%>%
    rename_at(vars(ends_with('newp')), list(~gsub('p_(.*)_(newp)', paste0('\\2_\\1_vs_',x), .))) %>% 
    select(ends_with(x))
})

bind_cols(tempDF, res)



d1 d2 d3 d4 p_A p_B p_C p4 newp_B_vs_A newp_C_vs_A newp_A_vs_B newp_C_vs_B newp_A_vs_C newp_B_vs_C
1  A 40 20 60   1   3   2  5         120          80          20          40          60         180
2  B 50 40 30   3   4   1  5         200          50         120          40          90         120
3  C 20 50 30   2   3   1  4          60          20         100          50          60          90
4  A 50 40 60   3   3   2  5         150         100         120          80         180         180
5  C 20 50 30   2   4   1  4          80          20         100          50          60         120

Upvotes: 2

Related Questions