Apply a function by row on a dataframe independently of the number of columns

Question

I'd like to apply a function by rows on a data.frame to concatenate column titles depending on the value in the row.

df 
      A     B
1  TRUE  TRUE
2 FALSE  TRUE
3 FALSE FALSE

      A     B Result
1  TRUE  TRUE A / B
2 FALSE  TRUE   B
3 FALSE FALSE NA

I read about dplyr using mutate() and rowwise(), but I don't know how to apply them since the columns aren't constants.

for a row "i" I would do something like:

paste(names(df)[as.logical(df[i,])], collapse = ' / ')

Any help would be welcome.

Thank you.

akrun · Accepted Answer

If the dataset is not really big (i.e. in millions/billions of rows) we can use apply with MARGIN=1 to loop over the rows, subset the names of the vector using the logical vector as index and paste them together. It is easier to code in a single line.

df$Result <- apply(df, 1, FUN = function(x) paste(names(x)[x], collapse=" / "))

However, if we have a big dataset, another option is to create a key/value pair and replace the values by matching and it is faster than the above solution.

v1 <- do.call(paste, df)
unname(setNames(c("A / B", "B", "A", NA), do.call(paste, 
          expand.grid(rep(list(c(TRUE, FALSE)), 2))))[v1])
#[1] "A / B" "B"     NA

Or we can use arithmetic operation to do this

c(NA, "A", "B", "A / B")[1 + df[,1] + 2 * df[,2]]
#[1] "A / B" "B"     NA

Benchmarks

Using @DavidArenburg's dataset and including the two solutions posted here (changed the column names of 'df' to 'A' and 'B')

newPaste <- function(df) {
    v1 <- do.call(paste, df)
  unname(setNames(c("A / B", "B", "A", NA), do.call(paste, 
      expand.grid(rep(list(c(TRUE, FALSE)), 2))))[v1])
}

arith <- function(df){
     c(NA, "A", "B", "A / B")[1 + df[,1] + 2 * df[,2]]
}

microbenchmark::microbenchmark(Rowwise(df), Colwise(df), newPaste(df),arith(df))
#Unit: milliseconds
#        expr        min        lq      mean     median         uq       max neval
#  Rowwise(df) 398.024791 453.68129 488.07312 481.051431 523.466771 688.36084   100
#  Colwise(df)  25.361609  28.10300  34.20972  30.952365  35.885061  95.92575   100
# newPaste(df)  65.777304  69.07432  82.08602  71.606890  82.232980 176.66516   100
#   arith(df)   1.790622   1.88339   4.74913   2.027674   4.753279  58.50942   100

Apply a function by row on a dataframe independently of the number of columns

Answers (2)

Benchmarks

Related Questions