Why do mutate() and add_column() not accept the same basic arguments?

Question

Often I wish to add a new column at a specific index; mutate() does not have a simple implementation of this, while add_column() does via .before and .after arguments. I would expect the two functions to behave the same in simple settings, but they do not. Below is a MWE converting row index to a new variable. The R documentation does not make it clear: why do these two functions differ in their basic syntax?

dat <- as.tibble(matrix(rnorm(1e4), nrow=100))
dat1 <- dat %>% mutate(id=row_number()) # works as expected
dat2 <- dat %>% add_column(id=row_number()) # throws error
dat3 <- dat %>% add_column(id=1:nrow(dat), .before=1) # works, but harder to read

Kevin Arseneau · Accepted Answer

If you examine the code for these two functions, you will get some clues.

dplyr::mutate

function (.data, ...) 
{
    UseMethod("mutate")
}

tibble::add_column

function (.data, ..., .before = NULL, .after = NULL) 
{
    df <- tibble(...)
    if (ncol(df) == 0L) {
        return(.data)
    }
    if (nrow(df) != nrow(.data)) {
        if (nrow(df) == 1) {
            df <- df[rep(1L, nrow(.data)), ]
        }
        else {
            stopc("`.data` must have ", nrow(.data), pluralise_n(" row(s)", 
                nrow(.data)), ", not ", nrow(df))
        }
    }
    extra_vars <- intersect(names(df), names(.data))
    if (length(extra_vars) > 0) {
        stopc(pluralise_msg("Column(s) ", extra_vars), pluralise(" already exist[s]", 
            extra_vars))
    }
    pos <- pos_from_before_after_names(.before, .after, colnames(.data))
    end_pos <- ncol(.data) + seq_len(ncol(df))
    indexes_before <- rlang::seq2(1L, pos)
    indexes_after <- rlang::seq2(pos + 1L, ncol(.data))
    indexes <- c(indexes_before, end_pos, indexes_after)
    .data[end_pos] <- df
    .data[indexes]
}

Firstly, you will note that they are from two different packages, albeit both part of the tidyverse.

Second, you will see that mutate uses a specified method whereas add_column is more of a convenience function written in base r with some rlang magic.

I'm not sure of the roadmap for either package, however, I'm sure you could propose an enhancement if there is not already one raised or fork the project and supply a pull request. It would be a useful addition.

Update

This has been raised in tidyverse/dplyr already and seems to be in the development pipeline, though not yet scheduled.

Why do mutate() and add_column() not accept the same basic arguments?

Answers (1)

dplyr::mutate

tibble::add_column

Update

Related Questions