dpprdan
dpprdan

Reputation: 1817

How can I manipulate dataframe columns with different values from an external vector (with dplyr)

In R, I would like to manipulate (say multiply) data.frame columns with appropriately named values stored in a vector (or data.frame, if that's easier).

Let's say, I want to first summarise the variables disp, hp, and wt from the mtcars dataset.

vars <- c("disp", "hp", "wt")
mtcars %>% 
  summarise_at(vars, funs(sum(.))

(throw a group_by(cyl) into the mix, or use mutate_at if you'd like to have more rows)

Now I'd like to multiply each of the resulting columns with a particular value, given by

multiplier <- c("disp" = 2, "hp" = 3, "wt" = 4)

Is it possible to refer to these within the summarise_at function?

The result should look like this (and I don't want to have to refer to the variable names directly while getting there):

disp    hp    wt
14766.2 14082 411.808

UPDATE:

Maybe my MWE was too minimal. Let's say I want to do the same operation with a data.frame grouped by cyl

mtcars %>% 
  group_by(cyl) %>% 
  summarise_at(vars, sum) 

The result should thus be:

    cyl   disp   hp      wt
1     4 2313.0 2727 100.572
2     6 2566.4 2568  87.280
3     8 9886.8 8787 223.956

UPDATE 2:

Maybe I was not explicit enough here either, but the columns in the data.frame should be multiplied by the respective values in the vector (and only those columns mentioned in the vector), so e.g. disp should be multiplied by 2, hp by 3 and wt by 4, all other variables (e.g. cyl) should remain untouched by the multiplication.

Upvotes: 1

Views: 318

Answers (2)

akrun
akrun

Reputation: 886938

We could also do this with map function from purrr

library(purrr)
mtcars %>%
    summarise_at(vars, sum) %>%
    map2_df(multiplier, `*`)
#      disp    hp      wt
#     <dbl> <dbl>   <dbl>
# 1 14766.2 14082 411.808

For the updated question

d1 <- mtcars %>% 
         group_by(cyl) %>% 
         summarise_at(vars, sum) 
d1 %>% 
   select(one_of(vars)) %>% 
   map2_df(multiplier[vars], ~ .x * .y) %>%
   bind_cols(d1 %>% select(-one_of(vars)), .) 
#    cyl   disp    hp      wt
#  <dbl>  <dbl> <dbl>   <dbl>
#1     4 2313.0  2727 100.572
#2     6 2566.4  2568  87.280
#3     8 9886.8  8787 223.956

Or we can use gather/spread

library(tidyr)
mtcars %>% 
    group_by(cyl) %>% 
    summarise_at(vars, sum) %>% 
    gather(var, val, -cyl) %>% 
    mutate(val = val*multiplier[match(var, names(multiplier))]) %>% 
    spread(var, val)
#     cyl   disp    hp      wt
#   <dbl>  <dbl> <dbl>   <dbl>
#1     4 2313.0  2727 100.572
#2     6 2566.4  2568  87.280
#3     8 9886.8  8787 223.956

Upvotes: 1

conor
conor

Reputation: 1287

I am not sure if you can do this in the summarise_at function, but this is close alternative...

library(dplyr)
library(purrr)

vars <- c("disp", "hp", "wt")
multiplier <- c("disp" = 2, "hp" = 3, "wt" = 4)

mtcars %>% 
    summarise_at(vars, sum) %>% 
    do(. * multiplier) 

     disp    hp      wt
1 14766.2 14082 411.808

****REDUX****

Include the grouping var cyl in the multiplier and set it equal to 1. @akrun's map2_df does the real work here.

vars <- c("disp", "hp", "wt")
multiplier <- c("cyl" = 1, "disp" = 2, "hp" = 3, "wt" = 4)

mtcars %>% 
    group_by(cyl) %>% 
    summarise_at(vars, sum) %>% 
    map2_df(multiplier, ~ .x * .y) 


    cyl   disp    hp      wt
  <dbl>  <dbl> <dbl>   <dbl>
1     4 2313.0  2727 100.572
2     6 2566.4  2568  87.280
3     8 9886.8  8787 223.956

Upvotes: 0

Related Questions