qnp1521
qnp1521

Reputation: 896

R mutate_at on a subset of rows

My question is similar to this post(Applying mutate_at conditionally to specific rows in a dataframe in R), and I could reproduce the result. But whey I tried to apply this to my problem, which is putting parenthesis to the cell value for selected rows and columns, I run into error messages. Here's a reproducible example.

df <- structure(list(dep = c("cyl", "cyl", "disp", "disp", "drat", 
"drat", "hp", "hp", "mpg", "mpg"), name = c("estimate", "t_stat", 
"estimate", "t_stat", "estimate", "t_stat", "estimate", "t_stat", 
"estimate", "t_stat"), dat1 = c(1.151, 6.686, 102.902, 12.107, 
-0.422, -5.237, 37.576, 5.067, -5.057, -8.185), dat2 = c(1.274, 
8.423, 106.429, 12.148, -0.394, -5.304, 38.643, 6.172, -4.843, 
-10.622), dat3 = c(1.078, 5.191, 103.687, 7.79, -0.194, -2.629, 
36.777, 4.842, -4.539, -7.91)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))  

Given above data frame, I hope to put parenthesis to the cell values of column dat1, dat2 and dat3when name == t_stat. Here's what I've tried, but it seems like that paste0 is not accepted inside of the case_when function in this case.

require(tidyverse)
df %>% mutate_at(vars(matches("dat")), 
+                  funs( case_when(name == 't_stat' ~ paste0("(", ., ")"), TRUE ~ .) )) 
Error: must be a character vector, not a double vector

When I use brute force, namely mutate each column, then it works but my actual problem has more than 10 columns so this is not really practical.

require(tidyverse)
> df %>%   mutate(dat1 = ifelse(name == "t_stat", paste0("(", dat1, ")"), dat1),
+                 dat2 = ifelse(name == "t_stat", paste0("(", dat2, ")"), dat1),
+                 dat3 = ifelse(name == "t_stat", paste0("(", dat3, ")"), dat1))
# A tibble: 10 x 5
   dep   name     dat1     dat2      dat3    
   <chr> <chr>    <chr>    <chr>     <chr>   
 1 cyl   estimate 1.151    1.151     1.151   
 2 cyl   t_stat   (6.686)  (8.423)   (5.191) 
 3 disp  estimate 102.902  102.902   102.902 
 4 disp  t_stat   (12.107) (12.148)  (7.79)  
 5 drat  estimate -0.422   -0.422    -0.422  
 6 drat  t_stat   (-5.237) (-5.304)  (-2.629)
 7 hp    estimate 37.576   37.576    37.576  
 8 hp    t_stat   (5.067)  (6.172)   (4.842) 
 9 mpg   estimate -5.057   -5.057    -5.057  
10 mpg   t_stat   (-8.185) (-10.622) (-7.91)

Upvotes: 1

Views: 848

Answers (3)

Neel Kamal
Neel Kamal

Reputation: 1076

Basically, you need to convert dbl to char first, and that is what the error message is also saying Error: must be a character vector, not a double vector

As @Rohan rightly said, case_when is type-strict meaning it expects output to be of same class.

df %>% mutate_at(vars(matches("dat")),
                 ~case_when(name =='t_stat'~ paste0("(",as.character(.x),")"),
                            T ~ as.character(.x))
                 )

output as

# A tibble: 10 x 5
   dep   name     dat1     dat2      dat3    
   <chr> <chr>    <chr>    <chr>     <chr>   
 1 cyl   estimate 1.151    1.274     1.078   
 2 cyl   t_stat   (6.686)  (8.423)   (5.191) 
 3 disp  estimate 102.902  106.429   103.687 
 4 disp  t_stat   (12.107) (12.148)  (7.79)  
 5 drat  estimate -0.422   -0.394    -0.194  
 6 drat  t_stat   (-5.237) (-5.304)  (-2.629)
 7 hp    estimate 37.576   38.643    36.777  
 8 hp    t_stat   (5.067)  (6.172)   (4.842) 
 9 mpg   estimate -5.057   -4.843    -4.539  
10 mpg   t_stat   (-8.185) (-10.622) (-7.91) 

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388907

case_when is type-strict meaning it expects output to be of same class. Your original columns are of type numeric whereas while adding "(" around your data you are making it of class character.

Also funs is long deprecated and mutate_at will soon be replaced with across.

library(dplyr)
df %>% 
    mutate_at(vars(matches("dat")), 
      ~case_when(name == 't_stat' ~ paste0("(", ., ")"), TRUE ~ as.character(.)))

Upvotes: 1

David T
David T

Reputation: 2143

The error message is ... unhelpful.

Your problem is that you're mixing numeric and character data in a column. The dat variables are numeric.

df %>% mutate_at(vars(matches("dat")), 
                 funs( case_when(name == 't_stat' ~ paste0("(", ., ")"),
                                 TRUE ~ as.character(.))))

# A tibble: 10 x 5
   dep   name     dat1     dat2      dat3    
   <chr> <chr>    <chr>    <chr>     <chr>   
 1 cyl   estimate 1.151    1.274     1.078   
 2 cyl   t_stat   (6.686)  (8.423)   (5.191) 
 3 disp  estimate 102.902  106.429   103.687 
 4 disp  t_stat   (12.107) (12.148)  (7.79)  
 5 drat  estimate -0.422   -0.394    -0.194  
 6 drat  t_stat   (-5.237) (-5.304)  (-2.629)
 7 hp    estimate 37.576   38.643    36.777  
 8 hp    t_stat   (5.067)  (6.172)   (4.842) 
 9 mpg   estimate -5.057   -4.843    -4.539  
10 mpg   t_stat   (-8.185) (-10.622) (-7.91) 

Upvotes: 1

Related Questions