Mike
Mike

Reputation: 1141

Create an output matrix or data frame using apply()

I've got a frequency cross-tab and would like to use rep() with an apply() function to make a long column of data for each sample (A01, A02 etc) that I can use for mean and stdev stats. The numbers in columns A01, A02 etc are frequency counts of CAG e.g. 6485 counts of 13 CAG.

I've managed to write the function to give the correct results, but the format doesn't appear to be indexable e.g. using sumstats$A01 gives NULL. I'd also ideally like the rows and columns inverted in the output table, so columns are mean, sd etc.

data <- data.frame(CAG = c(13, 14, 15), A01 = c(6485,35,132), A02 = c(0,42,56))
sumstats <- sapply(data[, 2:ncol(data)], function(x) {
data_e <- rep(data$CAG, x)

list(
  mean = mean(data_e),
  median = median(data_e),
  sd   = sd(data_e)
)
 })

#Output:
#sumstats$A01
#NULL

Upvotes: 0

Views: 1324

Answers (2)

Evan Friedland
Evan Friedland

Reputation: 3194

The $ subsetting is unique to the data.frame class. If you check class(sumstats) you will see it is just a simple matrix.

Simply run sumstats <- as.data.frame(sumstats) and then you can use

sumstats$A01
#$mean
#[1] 13.04495
#
#$median
#[1] 13
#
#$sd
#[1] 0.2874512

Is this what you wanted?

EDIT:

sumstats2 <- as.data.frame(t(sumstats))
res <- data.frame(samples, sumheight, sumstats2)
res
#    samples sumheight     mean median        sd
#A01     A01      6652 13.04495     13 0.2874512
#A02     A02        98 14.57143     15  0.497416

Upvotes: 1

Mike
Mike

Reputation: 1141

data <- data.frame(CAG = c(13, 14, 15), A01 = c(6485,35,132), A02 = c(0,42,56))

samples <- c('A01', 'A02')
sumheight <- colSums(data[ , 2:ncol(data)], na.rm=TRUE)

sumstats <- sapply(data[, 2:ncol(data)], function(x) {
  data_e <- rep(data$CAG, x)

  list(
    mean = mean(data_e),
    median = median(data_e),
    sd   = sd(data_e)
  )
})


sumstats2 <- as.data.frame(t(sumstats))
res <- data.frame(samples, sumheight, sumstats2$mean)

Upvotes: 0

Related Questions