Reputation: 344
Often I wish to add a new column at a specific index; mutate() does not have a simple implementation of this, while add_column() does via .before and .after arguments. I would expect the two functions to behave the same in simple settings, but they do not. Below is a MWE converting row index to a new variable. The R documentation does not make it clear: why do these two functions differ in their basic syntax?
dat <- as.tibble(matrix(rnorm(1e4), nrow=100))
dat1 <- dat %>% mutate(id=row_number()) # works as expected
dat2 <- dat %>% add_column(id=row_number()) # throws error
dat3 <- dat %>% add_column(id=1:nrow(dat), .before=1) # works, but harder to read
Upvotes: 5
Views: 2079
Reputation: 6264
If you examine the code for these two functions, you will get some clues.
function (.data, ...)
{
UseMethod("mutate")
}
<environment: namespace:dplyr>
function (.data, ..., .before = NULL, .after = NULL)
{
df <- tibble(...)
if (ncol(df) == 0L) {
return(.data)
}
if (nrow(df) != nrow(.data)) {
if (nrow(df) == 1) {
df <- df[rep(1L, nrow(.data)), ]
}
else {
stopc("`.data` must have ", nrow(.data), pluralise_n(" row(s)",
nrow(.data)), ", not ", nrow(df))
}
}
extra_vars <- intersect(names(df), names(.data))
if (length(extra_vars) > 0) {
stopc(pluralise_msg("Column(s) ", extra_vars), pluralise(" already exist[s]",
extra_vars))
}
pos <- pos_from_before_after_names(.before, .after, colnames(.data))
end_pos <- ncol(.data) + seq_len(ncol(df))
indexes_before <- rlang::seq2(1L, pos)
indexes_after <- rlang::seq2(pos + 1L, ncol(.data))
indexes <- c(indexes_before, end_pos, indexes_after)
.data[end_pos] <- df
.data[indexes]
}
<environment: namespace:tibble>
Firstly, you will note that they are from two different packages, albeit both part of the tidyverse.
Second, you will see that mutate
uses a specified method whereas add_column
is more of a convenience function written in base r with some rlang magic.
I'm not sure of the roadmap for either package, however, I'm sure you could propose an enhancement if there is not already one raised or fork the project and supply a pull request. It would be a useful addition.
This has been raised in tidyverse/dplyr already and seems to be in the development pipeline, though not yet scheduled.
Upvotes: 4