Reputation:
In my data.frame below, except the first two columns (person_id
and gender
), column names have been grouped together by names. For example, the audio_vocab
columns are 7 columns: audio_vocab_01
, ..., audio_vocab_07
.
I was wondering how I could take the mean of these similarly named columns in my data.frame and replace the resulting column for all its constituent columns (e.g., instead of the original 7 audio_vocab
columns, just need one audio_vocab_mean
column).
How can I do this for all my similarly named columns at once?
w2 <- read.csv('https://raw.githubusercontent.com/izeh/n/master/w2.csv', stringsAsFactors = F)
Upvotes: 2
Views: 44
Reputation: 887088
We can loop over the unique
column names by extracting the prefix part and get the rowMeans
un1 <- unique(sub("_\\d+$", "", names(w2)[-(1:2)]))
out <- cbind(w2[1:2], do.call(cbind, setNames(lapply(un1,
function(nm) rowMeans(w2[startsWith(names(w2), nm)], na.rm = TRUE)), un1)))
Upvotes: 0
Reputation: 388972
We can use split.default
to split similar named columns and take their row-wise mean.
cols <- 1:2
temp <- w2[-cols]
cbind(w2[cols], sapply(split.default(temp,
sub('_\\d+', '', names(temp))), rowMeans, na.rm = TRUE))
# person_id gender audio_vocab ctest dictation elicited_speech text_vocab
#1 1 MALE 0.837 0.800 0.5011 0.866 0.877
#2 2 MALE 0.909 0.957 0.7348 0.926 0.937
#3 3 FEMALE 0.826 0.737 0.5179 0.771 0.711
#4 4 FEMALE 0.775 0.591 0.5735 0.645 0.736
#5 5 MALE 0.473 0.548 0.0117 0.737 0.704
#6 6 OTHER 0.635 0.729 0.4294 0.669 0.852
#...
Upvotes: 1