Take the mean of similarly named columns at once in R

Question

In my data.frame below, except the first two columns (person_id and gender), column names have been grouped together by names. For example, the audio_vocab columns are 7 columns: audio_vocab_01, ..., audio_vocab_07.

I was wondering how I could take the mean of these similarly named columns in my data.frame and replace the resulting column for all its constituent columns (e.g., instead of the original 7 audio_vocab columns, just need one audio_vocab_mean column).

How can I do this for all my similarly named columns at once?

w2 <- read.csv('https://raw.githubusercontent.com/izeh/n/master/w2.csv', stringsAsFactors = F)

Ronak Shah · Accepted Answer

We can use split.default to split similar named columns and take their row-wise mean.

cols <- 1:2
temp <- w2[-cols]
cbind(w2[cols], sapply(split.default(temp, 
                   sub('_\d+', '', names(temp))), rowMeans, na.rm = TRUE))


#  person_id gender audio_vocab ctest dictation elicited_speech text_vocab
#1         1   MALE       0.837 0.800    0.5011           0.866      0.877
#2         2   MALE       0.909 0.957    0.7348           0.926      0.937
#3         3 FEMALE       0.826 0.737    0.5179           0.771      0.711
#4         4 FEMALE       0.775 0.591    0.5735           0.645      0.736
#5         5   MALE       0.473 0.548    0.0117           0.737      0.704
#6         6  OTHER       0.635 0.729    0.4294           0.669      0.852
#...

Take the mean of similarly named columns at once in R

Answers (2)

Related Questions