Ruben Kazumov
Ruben Kazumov

Reputation: 3872

Use names of data.frame vectors as elements of function argument

Consider this example:

library(tidyverse)

# function my.sum(v)
# arguments:
#  v - numeric vector
# returns:
#  numeric value
#
# accepts the vector as single argument,
# returns sum of the vector elements.
my.sum <- function (v) {
  return(sum(v))
}  

# function my.inverse(val)
# arguments:
#  val - numeric value
# returns:
#  numeric value
#
# changes sign of the argument and returns it
my.inverse <- function (val) {
  return(- val)
}

data.frame(x = rnorm(10, mean = 0, sd = 1), 
           y = rnorm(10, mean = 5, sd = 3)) %>%
  mutate(my_sum = my.sum(c(x,y)),
         my_inverse = my.inverse(x))

the result of execution is:

            x        y   my_sum my_inverse
1  -1.3299817 3.359306 49.23083  1.3299817
2   1.3657651 4.636359 49.23083 -1.3657651
3  -0.2122119 1.760494 49.23083  0.2122119
4   0.7002765 7.396804 49.23083 -0.7002765
5  -0.5828975 4.811493 49.23083  0.5828975
6   1.1202625 4.294421 49.23083 -1.1202625
7   1.2512032 4.907165 49.23083 -1.2512032
8   0.9228939 5.215929 49.23083 -0.9228939
9   0.1800447 0.666941 49.23083 -0.1800447
10 -0.8906996 9.657261 49.23083  0.8906996

As one can see, column my_inverse, which is a return from the function my.inverse(val) is getting the value from the column x, performs negation and writes the result as expected.

The column my_sum, which is a return of the function my.sum(v) consists the constant in each row. This constant is the sum of joined elements of two whole vectors x and y:

sum(c(x, y))

I expect, that the function my.sum() inside mutate() directive will receive the single row x and y values and joins them in c(x, y), but as I can see, the c() behaves like a closure.

How one can avoid this R behavior?

Upvotes: 0

Views: 45

Answers (1)

Jilber Urbina
Jilber Urbina

Reputation: 61204

Use this alternative:

set.seed(505)
dat <-data.frame(x = rnorm(10, mean = 0, sd = 1), 
              y = rnorm(10, mean = 5, sd = 3),
              z = rnorm(10, mean = 10, sd = 3)) 

my.sum <- function (df, variables) {
  return(rowSums(df[, variables]))
}  


dat %>%
  mutate(my_sum = my.sum(., c("x","y")))
#            x         y         z    my_sum
# 1  -1.1211894 -1.620691  3.800424 -2.741880
# 2  -1.2820570  4.263010  8.831353  2.980953
# 3  -2.0393425  3.563943 13.118901  1.524600
# 4  -0.9377324  4.397400 11.522940  3.459667
# 5  -0.5101607  6.440323 12.993764  5.930162
# 6  -0.4128447  6.071003 11.765313  5.658158
# 7  -0.9103679  6.300995  8.811942  5.390627
# 8   0.1407611  9.089672 10.621450  9.230433
# 9   0.5647174  7.472852 10.407413  8.037570
# 10 -0.4322744  2.479842 14.400691  2.047568

Upvotes: 1

Related Questions