row average of columns that match string

Question

I have a data frame below and I want to find the average row value for all columns with header *R and all columns with *G.

The output should then be four columns: Rfam, Classes, avg.rowR, avg.rowG

I was playing around with the rowMeans() function, but I am not sure how to specify the columns.

Rfam    Classes 26G 26R 35G 35R 46G 46R 48G 48R 55G 55R
5_8S_rRNA   rRNA    63  39  8   27  26  17  28  43  41  17
5S_rRNA rRNA    171 149 119 109 681 47  95  161 417 153
7SK 7SK 53  282 748 371 248 42  425 384 316 198
ACA64   Other   7   8   19  2   10  1   36  10  10  4
let-7   miRNA   121825  73207   25259   75080   54301   63510   30444   53800   78961   47533
lin-4   miRNA   10149   16263   5629    19680   11297   37866   3816    9677    11713   10068
Metazoa_SRP SRP 317 1629    1008    418 1205    407 1116    1225    1413    1075
mir-1   miRNA   3   4   1   2   0   26  1   1   0   4
mir-10  miRNA   912163  1411287 523793  1487160 517017  1466085 107597  551381  727720  788201
mir-101 miRNA   461 320 199 553 174 460 278 297 256 254
mir-103 miRNA   937 419 202 497 318 217 328 343 891 439
mir-1180    miRNA   110 32  4   17  53  47  6   29  35  22
mir-1226    miRNA   11  3   0   3   6   0   1   2   5   4
mir-1237    miRNA   3   2   1   1   0   1   0   2   1   1
mir-1249    miRNA   5   14  2   9   4   5   9   5   7   7

Pierre L · Accepted Answer

newcols <- sapply(c("R$", "G$"), function(x) rowMeans(df[grep(x, names(df))]))
setNames(cbind(df[1:2], newcols), c(names(df)[1:2], "avg.rowR", "avg.rowG"))
#           Rfam Classes  avg.rowR avg.rowG
# 1    5_8S_rRNA    rRNA      28.6     33.2
# 2      5S_rRNA    rRNA     123.8    296.6
# 3          7SK     7SK     255.4    358.0
# 4        ACA64   Other       5.0     16.4
# 5        let-7   miRNA   62626.0  62158.0
# 6        lin-4   miRNA   18710.8   8520.8
# 7  Metazoa_SRP     SRP     950.8   1011.8
# 8        mir-1   miRNA       7.4      1.0
# 9       mir-10   miRNA 1140822.8 557658.0
# 10     mir-101   miRNA     376.8    273.6
# 11     mir-103   miRNA     383.0    535.2
# 12    mir-1180   miRNA      29.4     41.6
# 13    mir-1226   miRNA       2.4      4.6
# 14    mir-1237   miRNA       1.4      1.0
# 15    mir-1249   miRNA       8.0      5.4

One way to look for patterns in column names is to use the grep family of functions. The function call grep("R$", names(df)) will return the index of all column names that end with R. When we use it with sapply we can search for the R and G columns in one expression.

The core of the second line is cbind(df[1:2], newcols). That is the binding of the first two columns of df and the two new columns of mean values. Wrapping it with setNames(.., c(names(df)f[1:2]....)) formats the column names to match your desired output.

row average of columns that match string

Answers (1)

Related Questions