Travis Gerke
Travis Gerke

Reputation: 344

Why do mutate() and add_column() not accept the same basic arguments?

Often I wish to add a new column at a specific index; mutate() does not have a simple implementation of this, while add_column() does via .before and .after arguments. I would expect the two functions to behave the same in simple settings, but they do not. Below is a MWE converting row index to a new variable. The R documentation does not make it clear: why do these two functions differ in their basic syntax?

dat <- as.tibble(matrix(rnorm(1e4), nrow=100))
dat1 <- dat %>% mutate(id=row_number()) # works as expected
dat2 <- dat %>% add_column(id=row_number()) # throws error
dat3 <- dat %>% add_column(id=1:nrow(dat), .before=1) # works, but harder to read

Upvotes: 5

Views: 2079

Answers (1)

Kevin Arseneau
Kevin Arseneau

Reputation: 6264

If you examine the code for these two functions, you will get some clues.

dplyr::mutate

function (.data, ...) 
{
    UseMethod("mutate")
}
<environment: namespace:dplyr>

tibble::add_column

function (.data, ..., .before = NULL, .after = NULL) 
{
    df <- tibble(...)
    if (ncol(df) == 0L) {
        return(.data)
    }
    if (nrow(df) != nrow(.data)) {
        if (nrow(df) == 1) {
            df <- df[rep(1L, nrow(.data)), ]
        }
        else {
            stopc("`.data` must have ", nrow(.data), pluralise_n(" row(s)", 
                nrow(.data)), ", not ", nrow(df))
        }
    }
    extra_vars <- intersect(names(df), names(.data))
    if (length(extra_vars) > 0) {
        stopc(pluralise_msg("Column(s) ", extra_vars), pluralise(" already exist[s]", 
            extra_vars))
    }
    pos <- pos_from_before_after_names(.before, .after, colnames(.data))
    end_pos <- ncol(.data) + seq_len(ncol(df))
    indexes_before <- rlang::seq2(1L, pos)
    indexes_after <- rlang::seq2(pos + 1L, ncol(.data))
    indexes <- c(indexes_before, end_pos, indexes_after)
    .data[end_pos] <- df
    .data[indexes]
}
<environment: namespace:tibble>

Firstly, you will note that they are from two different packages, albeit both part of the .

Second, you will see that mutate uses a specified method whereas add_column is more of a convenience function written in base with some magic.

I'm not sure of the roadmap for either package, however, I'm sure you could propose an enhancement if there is not already one raised or fork the project and supply a pull request. It would be a useful addition.

Update

This has been raised in tidyverse/dplyr already and seems to be in the development pipeline, though not yet scheduled.

Upvotes: 4

Related Questions