ChazC
ChazC

Reputation: 61

Want to mutate columns that average columns together based on column names, but also excludes certain columns from the calculation?

Working in a data frame, I want to create a new column using mutate that averages all columns in each row together, besides one, based on column name. I need to be able to exclude a certain column in each use of mutate, and I'd like the calculation to skip over NA values as well.

Simple version of my DF:

   Team stat1 stat2 stat3 stat4
1  ARI     3    NA     4     6
2  BAL    NA     2    NA     1
3  CAR     5     4     6     2

NewCol1 created from calculating mean of stat columns, excluding 'stat 1' column and NA values. Same thing done for NewCol2, calculated mean excludes the 'stat2' column:

  Team stat1 stat2 stat3 stat4 NewCol1 NewCol2
1  ARI     3    NA     4     6     5.0    4.33
2  BAL    NA     2    NA     1     1.5    1.00
3  CAR     5     4     6     2     4.0    4.33

What would be the most efficient way to do this if I want to create new columns that do the same thing for each stat? The DF has 10 stat columns, each with the same name and then a number after each name. I was thinking the starts_with() function might be of use here with rowMeans(), but struggling with how I'd implement that while also excluding a certain column each time.

Upvotes: 1

Views: 938

Answers (2)

akrun
akrun

Reputation: 887851

We can use rowMeans after selecting out the relevant columns

library(dplyr)
df1 %>%
      mutate(NewCol1 = rowMeans(select(., -Team, -stat1), na.rm = TRUE),
        NewCol2 = rowMeans(select(., -Team, -stat2), na.rm = TRUE))

-output

#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2
#1  ARI     3    NA     4     6     5.0 4.333333
#2  BAL    NA     2    NA     1     1.5 1.000000
#3  CAR     5     4     6     2     4.0 4.333333

Or another option with c_across

df1 %>% 
   rowwise %>%
   mutate(NewCol1 = mean(c_across(c(where(is.numeric), -stat1)), na.rm = TRUE), 
   NewCol2 = mean(c_across(c(starts_with('stat'), -stat2)), na.rm = TRUE), 
   NewCol3 = mean(c_across(c(starts_with('stat'), -stat3)), na.rm = TRUE), 
   NewCol4 = mean(c_across(c(starts_with('stat'), -stat4)), na.rm = TRUE)) %>%
   ungroup

-output

# A tibble: 3 x 9
#  Team  stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#  <chr> <int> <int> <int> <int>   <dbl>   <dbl>   <dbl>   <dbl>
#1 ARI       3    NA     4     6     5      4.33    4.5      3.5
#2 BAL      NA     2    NA     1     1.5    1       1.5      2  
#3 CAR       5     4     6     2     4      4.33    3.67     5  

If we want to do this automatically, an option is

library(purrr)
df1[paste0("NewCol", 1:2)] <-  map(c('stat1', 'stat2'),
                       ~ df1 %>%
                             select(starts_with('stat'), -.x) %>%
                             rowMeans(., na.rm = TRUE))

Or to create columns 1 to 4

nm1 <- names(df1)[startsWith(names(df1), 'stat')]
df1[paste0("NewCol", seq_along(nm1))] <-  map(nm1,
                       ~ df1 %>%
                             select(starts_with('stat'), -.x) %>%
                             rowMeans(., na.rm = TRUE))

-output

df1
#   Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0

Or to do this fully in tidyverse

library(stringr)
map_dfc(nm1,  ~
    df1 %>% 
       select(starts_with('stat'), -.x) %>% 
       transmute(!! str_c('NewCol', readr::parse_number(.x)) := 
              rowMeans(., na.rm = TRUE))) %>% 
       bind_cols(df1, .)
#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0

Or using rowwise/c_across

map_dfc(nm1,  ~
     df1 %>% 
        select(starts_with('stat'), -.x) %>% rowwise %>%
        transmute(!! str_c('NewCol', readr::parse_number(.x)) :=   mean(c_across(everything()), na.rm = TRUE))) %>%
        ungroup %>%
    bind_cols(df1, .)

-output

#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0

Or using base R

df1[paste0("NewCol", seq_along(nm1))] <- lapply(nm1,
            function(x) rowMeans(df1[setdiff(names(df1)[-1], x)],  na.rm = TRUE))

data

df1 <- structure(list(Team = c("ARI", "BAL", "CAR"), stat1 = c(3L, NA, 
5L), stat2 = c(NA, 2L, 4L), stat3 = c(4L, NA, 6L), stat4 = c(6L, 
1L, 2L)), class = "data.frame", row.names = c("1", "2", "3"))

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389235

In base R, you can find the columns which has 'stat' in it and one by one remove it from lapply and take row-wise mean of it.

cols <- grep('stat', names(df))
new_cols <- paste0('remove_', names(df)[cols])
df[new_cols] <- lapply(cols, function(x) rowMeans(df[, -c(1, x)], na.rm = TRUE))
df

#  Team stat1 stat2 stat3 stat4 remove_stat1 remove_stat2 remove_stat3 remove_stat4
#1  ARI     3    NA     4     6          5.0     4.333333     4.500000          3.5
#2  BAL    NA     2    NA     1          1.5     1.000000     1.500000          2.0
#3  CAR     5     4     6     2          4.0     4.333333     3.666667          5.0

Upvotes: 2

Related Questions