shadow
shadow

Reputation: 22313

difference between plyr::mutate and dplyr::mutate

dplyr::mutate() works the same way as plyr::mutate() and similarly to base::transform(). The key difference between mutate() and transform() is that mutate allows you to refer to columns that you just created. - Introduction to dplyr

There are some differences between the mutate function in dplyr and plyr. The main difference is of course that plyr::mutate can be applied to lists and dplyr::mutate is faster.

Moreover, when referring to the just created columns, plyr cannot reassign them again, but dplyr does.

# creating a temporary variable and removing it later
plyr::mutate(data.frame(a = 2), tmp = a, c = a*tmp, tmp = NULL) 
## a tmp c
## 1 2   2 4
dplyr::mutate(data.frame(a = 2), tmp = a, c = a*tmp, tmp = NULL)
## a c
## 1 2 4

# creating a temporery variable and changing it later
plyr::mutate(data.frame(a = 2), b = a, c = a*b, b = 1)
## a b c
## 1 2 2 4
dplyr::mutate(data.frame(a = 2), b = a, c = a*b, b = 1)
## a b c
## 1 2 1 4

Now I am looking for the functionality of the dplyr mutate function for list objects. So I am looking for a function that mutates a list and can reassign just created variables.

plyr::mutate(list(a = 2), b = a, c = a*b, b = 1)
## $a
## [1] 2
## 
## $b
## [1] 2
## 
## $c
## [1] 4
dplyr::mutate(list(a = 2), b = a, c = a*b, b = 1)
## Error in UseMethod("mutate_") : 
##   no applicable method for 'mutate_' applied to an object of class "list"
desired_mutate(list(a = 2), b = a, c = a*b, b = 1)
## $a
## [1] 2
## 
## $b
## [1] 1
## 
## $c
## [1] 4

I realize that in this simple case, I can just use

plyr::mutate(list(a = 2), c = {b = a; a*b})

But in my actual use case, I assign random numbers to a temporary variable and would like to remove it afterwards. Something like the following:

desired_mutate(list(a = c(1, 2, 5, 2)), 
                    tmp = runif(length(a)), 
                    b = tmp * a, 
                    c = tmp + a,
                    tmp = NULL)

Upvotes: 2

Views: 2797

Answers (1)

bergant
bergant

Reputation: 7232

Corrected original for loop in mutate function (using cols position instead of names):

desired_mutate <- function (.data, ...) 
{
  stopifnot(is.data.frame(.data) || is.list(.data) || is.environment(.data))
  cols <- as.list(substitute(list(...))[-1])
  cols <- cols[names(cols) != ""]
  col_names <- names(cols)
  for (i in seq_along(col_names) ) {
    if(!is.null(cols[[i]])) {
      .data[[col_names[i]]] <- eval(cols[[i]], .data, parent.frame()) 
    } else {
      .data[[col_names[i]]] <- NULL
    }
  }
  .data
}

Test:

> str( desired_mutate(list(a = c(1, 2, 5, 2)), 
+                tmp = runif(length(a)), 
+                b = tmp * a, 
+                c = tmp + a,
+                tmp = NULL) )
List of 3
 $ a: num [1:4] 1 2 5 2
 $ b: num [1:4] 0.351 1.399 3.096 1.4
 $ c: num [1:4] 1.35 2.7 5.62 2.7

Upvotes: 1

Related Questions