CineyEveryday
CineyEveryday

Reputation: 147

How can I mutate all columns that match a string in R?

I want to recode all the columns in my dataframe that contain the string "calcium" anywhere in the column name. So I'm trying to combine grepl with mutate from dplyr, but I get an error.

Any idea what I'm doing wrong? I hope this is possible!

The code I've tried is below using dplyr,

#Make the dataframe
library(dplyr)
fake <-data.frame(id=c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3),              
              time=c(rep("Time1",9), rep("Time2",9)), 
              test=c("calcium","magnesium","zinc","calcium","magnesium","zinc","calcium","magnesium","zinc","calcium","magnesium","zinc","calcium","magnesium","zinc","calcium","magnesium","zinc"), 
              score=rnorm(18))
df <- dcast(fake, id ~ time + test)

#My attempt
df <- df %>% mutate(category=cut(df[,grepl("calcium", colnames(df))], breaks=c(-Inf, 1.2, 6, 12, Inf), labels=c(0,1,2,3)))
#Error:  'x' must be numeric

#My second attempt 
df <- df %>% mutate_at(vars(contains('calcium')), cut(breaks=c(-Inf, 1.2, 6, 12, Inf), labels=c(0,1,2,3)))
#Error: "argument "x" is missing, with no default"

Upvotes: 3

Views: 5616

Answers (1)

william3031
william3031

Reputation: 1708

Is this what you are after?

library(tidyverse)
library(reshape2) # I added this for your dcast

fake <-data.frame(id=c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3),              
                  time=c(rep("Time1",9), rep("Time2",9)), 
                  test=c("calcium","magnesium","zinc","calcium","magnesium","zinc", 
                         "calcium","magnesium","zinc","calcium","magnesium","zinc",
                         "calcium","magnesium","zinc","calcium","magnesium","zinc"), 
                  score=rnorm(18))
df <- dcast(fake, id ~ time + test)
df <- as_tibble(df) #added this

#code
df <- df %>% 
  mutate_at(vars(contains('calcium')), 
            ~cut(., 
                 breaks=c(-Inf, 1.2, 6, 12, Inf), 
                 labels=c(0, 1, 2, 3))) %>%
  mutate_at(vars(ends_with("_calcium")), funs(as.numeric)) 

Which produces this:

# A tibble: 3 x 7
     id Time1_calcium Time1_magnesium Time1_zinc Time2_calcium Time2_magnesium
  <dbl>         <dbl>           <dbl>      <dbl>         <dbl>           <dbl>
1     1             2          -0.256      0.303             1          0.144 
2     2             2           2.18       0.417             1          0.0650
3     3             1           0.863     -2.32              1          0.163 
# ... with 1 more variable: Time2_zinc <dbl>

Based on this: https://suzan.rbind.io/2018/02/dplyr-tutorial-2/#mutate-at-to-change-specific-columns

Upvotes: 3

Related Questions