Reputation: 1270
I am trying to apply a certain function to groups of columns from a data frame based upon a 'design' vector containing the column indices that are part of the same experimental design 'group' (i.e. replicates). My observations are the rows, my sampling points are the columns.
The design vector designates which columns should group together:
designvector <- c(rep(1,2), rep(2,3), rep(3,3), rep(4,2), rep(5,2), rep(6,2),
rep(7,2), rep(8,2), rep(9,2))
A small example of the data frame to which I want to apply the function is:
structure(list(`1` = c(4381L, 608L, 7648L, 458L, 350L, 203L),
`1` = c(6450L, 1389L, 4896L, 526L, 920L, 352L), `2` = c(1966L,
59L, 492L, 5291L, 1401L, 133L), `2` = c(6338L, 281L, 2649L,
4718L, 1281L, 377L), `2` = c(12399L, 578L, 3094L, 1787L,
1180L, 541L), `3` = c(9629L, 554L, 7299L, 2819L, 1314L, 497L
), `3` = c(11329L, 709L, 3720L, 2909L, 1929L, 655L), `3` = c(11319L,
535L, 5212L, 2191L, 1239L, 633L), `4` = c(7427L, 8637L, 894L,
2L, 782L, 120L), `4` = c(6748L, 9139L, 431L, 28L, 871L, 224L
), `5` = c(7125L, 11819L, 1728L, 9L, 607L, 313L), `5` = c(8651L,
11022L, 442L, 96L, 728L, 249L), `6` = c(17879L, 3402L, 319L,
6L, 1226L, 489L), `6` = c(20859L, 2648L, 463L, 10L, 1189L,
408L), `7` = c(13457L, 1124L, 9386L, 18L, 635L, 367L), `7` = c(16292L,
1732L, 6552L, 20L, 1022L, 431L), `8` = c(9035L, 5887L, 185L,
11L, 550L, 1814L), `8` = c(14831L, 5833L, 570L, 8L, 1089L,
1462L), `9` = c(22023L, 2254L, 5212L, 63L, 555L, 1254L),
`9` = c(16887L, 2491L, 4949L, 68L, 921L, 983L)), .Names = c("1",
"1", "2", "2", "2", "3", "3", "3", "4", "4", "5", "5", "6", "6",
"7", "7", "8", "8", "9", "9"), row.names = c(NA, 6L), class = "data.frame")
However, using ddply
I get an error which I do not really understand:
ddply(abmat.sum,.(designvector),mean)
gives the following output:
designvector V1
1 1 NA
2 2 NA
3 3 NA
4 4 NA
5 5 NA
6 6 NA
7 7 NA
8 8 NA
9 9 NA
Warning messages:
1: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
2: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
3: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
4: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
5: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
6: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
7: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
8: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
9: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
I am clueless as to what I am doing wrong here. Any suggestions using ddply or other methods then for-looping over the dataframe are welcome.
Upvotes: 0
Views: 266
Reputation: 121077
The problem is that abmat.sum
is in the wrong form (it is "wide" rather than "long", as required by ddply
). Use melt
to fix that.
library(reshape2)
abmat.sum_long <- melt(abmat.sum)
abmat.sum_long$variable <- as.numeric(abmat.sum_long$variable)
You also need to pass summarise
to ddply
.
library(plyr)
ddply(abmat.sum_long, .(variable), summarise, mean_value = mean(value))
Upvotes: 1