Adam Amin
Adam Amin

Reputation: 1456

How to aggregate a data frame by columns and rows?

I have the following data set:

Class   Total   AC  Final_Coverage
A   1000        1   55
A   1000        2   66
B   1000        1   77
A   1000        3   88
B   1000        2   99
C   1000        1   11
B   1000        3   12
B   1000        4   13
B   1000        5   22
C   1000        2   33
C   1000        3   44
C   1000        4   55
C   1000        5   102
A   1000        4   105
A   1000        5   109

I would like to get the average of the AC and the Final_Coverage for the first three rows of each class. Then, I want to store the average values along with the class name in a new dataframe. To do that, I did the following:

dataset <- read_csv("/home/ad/Desktop/testt.csv")

classes <- unique(dataset$Class)
new_data <- data.frame(Class = character(0), AC = numeric(0), Coverage = numeric(0))

for(class in classes){
  new_data$Class <- class
  dataClass <- subset(dataset, Class == class)

  tenRows <- dataClass[1:3,]

  coverageMean <- mean(tenRows$Final_Coverage)
  acMean <- mean(tenRows$AC)

  new_data$Coverage <- coverageMean
  new_data$AC <- acMean
}

Everything works fine except entering the average value into the new_data frame. I get the following error:

Error in `$<-.data.frame`(`*tmp*`, "Class", value = "A") : 
  replacement has 1 row, data has 0

Do you know how to solve this?

Upvotes: 4

Views: 46

Answers (2)

jay.sf
jay.sf

Reputation: 72603

You could look into aggregate().

> aggregate(df1[df1$AC <= 3, 3:4], by=list(Class=df1[df1$AC <= 3, 1]), FUN=mean)
  Class AC Final_Coverage
1     A  2       69.66667
2     B  2       62.66667
3     C  2       29.33333

DATA

df1 <- structure(list(Class = structure(c(1L, 1L, 2L, 1L, 2L, 3L, 2L, 
                                          2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L), .Label = c("A", "B", "C"), class = "factor"), 
                      Total = c(1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 
                                1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L), 
                      AC = c(1L, 2L, 1L, 3L, 2L, 1L, 3L, 4L, 5L, 2L, 3L, 4L, 5L, 
                             4L, 5L), Final_Coverage = c(55L, 66L, 77L, 88L, 99L, 11L, 
                                                         12L, 13L, 22L, 33L, 44L, 55L, 102L, 105L, 109L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                               -15L))

Upvotes: 1

FloSchmo
FloSchmo

Reputation: 748

This should get you the new dataframe by using dplyr.

dataset %>% group_by(Class) %>% slice(1:3) %>% summarise(AC= mean(AC),
                                                           Coverage= mean(Final_Coverage))

In your method the error is that you initiated your new dataframe with 0 rows and try to assign a single value to it. This is reflected by the error. You want to replace one row to a dataframe with 0 rows. This would work, though:

new_data <- data.frame(Class = classes, AC = NA, Coverage = NA)

for(class in classes){
 new_data$Class <- class
 dataClass <- subset(dataset, Class == class)

 tenRows <- dataClass[1:3,]

 coverageMean <- mean(tenRows$Final_Coverage)
 acMean <- mean(tenRows$AC)

 new_data$Coverage[classes == class] <- coverageMean
 new_data$AC[classes == class] <- acMean
}

Upvotes: 2

Related Questions