Reputation: 1141
I've got a frequency cross-tab and would like to use rep()
with an apply()
function to make a long column of data for each sample (A01, A02 etc) that I can use for mean and stdev stats. The numbers in columns A01, A02 etc are frequency counts of CAG e.g. 6485 counts of 13 CAG.
I've managed to write the function to give the correct results, but the format doesn't appear to be indexable e.g. using sumstats$A01
gives NULL
. I'd also ideally like the rows and columns inverted in the output table, so columns are mean, sd etc.
data <- data.frame(CAG = c(13, 14, 15), A01 = c(6485,35,132), A02 = c(0,42,56))
sumstats <- sapply(data[, 2:ncol(data)], function(x) {
data_e <- rep(data$CAG, x)
list(
mean = mean(data_e),
median = median(data_e),
sd = sd(data_e)
)
})
#Output:
#sumstats$A01
#NULL
Upvotes: 0
Views: 1324
Reputation: 3194
The $
subsetting is unique to the data.frame class. If you check class(sumstats)
you will see it is just a simple matrix.
Simply run sumstats <- as.data.frame(sumstats)
and then you can use
sumstats$A01
#$mean
#[1] 13.04495
#
#$median
#[1] 13
#
#$sd
#[1] 0.2874512
Is this what you wanted?
EDIT:
sumstats2 <- as.data.frame(t(sumstats))
res <- data.frame(samples, sumheight, sumstats2)
res
# samples sumheight mean median sd
#A01 A01 6652 13.04495 13 0.2874512
#A02 A02 98 14.57143 15 0.497416
Upvotes: 1
Reputation: 1141
data <- data.frame(CAG = c(13, 14, 15), A01 = c(6485,35,132), A02 = c(0,42,56))
samples <- c('A01', 'A02')
sumheight <- colSums(data[ , 2:ncol(data)], na.rm=TRUE)
sumstats <- sapply(data[, 2:ncol(data)], function(x) {
data_e <- rep(data$CAG, x)
list(
mean = mean(data_e),
median = median(data_e),
sd = sd(data_e)
)
})
sumstats2 <- as.data.frame(t(sumstats))
res <- data.frame(samples, sumheight, sumstats2$mean)
Upvotes: 0