Want to mutate columns that average columns together based on column names, but also excludes certain columns from the calculation?

Question

Working in a data frame, I want to create a new column using mutate that averages all columns in each row together, besides one, based on column name. I need to be able to exclude a certain column in each use of mutate, and I'd like the calculation to skip over NA values as well.

Simple version of my DF:

   Team stat1 stat2 stat3 stat4
1  ARI     3    NA     4     6
2  BAL    NA     2    NA     1
3  CAR     5     4     6     2

NewCol1 created from calculating mean of stat columns, excluding 'stat 1' column and NA values. Same thing done for NewCol2, calculated mean excludes the 'stat2' column:

  Team stat1 stat2 stat3 stat4 NewCol1 NewCol2
1  ARI     3    NA     4     6     5.0    4.33
2  BAL    NA     2    NA     1     1.5    1.00
3  CAR     5     4     6     2     4.0    4.33

What would be the most efficient way to do this if I want to create new columns that do the same thing for each stat? The DF has 10 stat columns, each with the same name and then a number after each name. I was thinking the starts_with() function might be of use here with rowMeans(), but struggling with how I'd implement that while also excluding a certain column each time.

Ronak Shah · Accepted Answer

In base R, you can find the columns which has 'stat' in it and one by one remove it from lapply and take row-wise mean of it.

cols <- grep('stat', names(df))
new_cols <- paste0('remove_', names(df)[cols])
df[new_cols] <- lapply(cols, function(x) rowMeans(df[, -c(1, x)], na.rm = TRUE))
df

#  Team stat1 stat2 stat3 stat4 remove_stat1 remove_stat2 remove_stat3 remove_stat4
#1  ARI     3    NA     4     6          5.0     4.333333     4.500000          3.5
#2  BAL    NA     2    NA     1          1.5     1.000000     1.500000          2.0
#3  CAR     5     4     6     2          4.0     4.333333     3.666667          5.0

Want to mutate columns that average columns together based on column names, but also excludes certain columns from the calculation?

Answers (2)

data

Related Questions