Reputation: 1456
I have the following data set:
Class Total AC Final_Coverage
A 1000 1 55
A 1000 2 66
B 1000 1 77
A 1000 3 88
B 1000 2 99
C 1000 1 11
B 1000 3 12
B 1000 4 13
B 1000 5 22
C 1000 2 33
C 1000 3 44
C 1000 4 55
C 1000 5 102
A 1000 4 105
A 1000 5 109
I would like to get the average of the AC
and the Final_Coverage
for the first three rows of each class. Then, I want to store the average values along with the class name in a new dataframe. To do that, I did the following:
dataset <- read_csv("/home/ad/Desktop/testt.csv")
classes <- unique(dataset$Class)
new_data <- data.frame(Class = character(0), AC = numeric(0), Coverage = numeric(0))
for(class in classes){
new_data$Class <- class
dataClass <- subset(dataset, Class == class)
tenRows <- dataClass[1:3,]
coverageMean <- mean(tenRows$Final_Coverage)
acMean <- mean(tenRows$AC)
new_data$Coverage <- coverageMean
new_data$AC <- acMean
}
Everything works fine except entering the average value into the new_data
frame. I get the following error:
Error in `$<-.data.frame`(`*tmp*`, "Class", value = "A") :
replacement has 1 row, data has 0
Do you know how to solve this?
Upvotes: 4
Views: 46
Reputation: 72603
You could look into aggregate()
.
> aggregate(df1[df1$AC <= 3, 3:4], by=list(Class=df1[df1$AC <= 3, 1]), FUN=mean)
Class AC Final_Coverage
1 A 2 69.66667
2 B 2 62.66667
3 C 2 29.33333
DATA
df1 <- structure(list(Class = structure(c(1L, 1L, 2L, 1L, 2L, 3L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L), .Label = c("A", "B", "C"), class = "factor"),
Total = c(1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L,
1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L),
AC = c(1L, 2L, 1L, 3L, 2L, 1L, 3L, 4L, 5L, 2L, 3L, 4L, 5L,
4L, 5L), Final_Coverage = c(55L, 66L, 77L, 88L, 99L, 11L,
12L, 13L, 22L, 33L, 44L, 55L, 102L, 105L, 109L)), class = "data.frame", row.names = c(NA,
-15L))
Upvotes: 1
Reputation: 748
This should get you the new dataframe by using dplyr
.
dataset %>% group_by(Class) %>% slice(1:3) %>% summarise(AC= mean(AC),
Coverage= mean(Final_Coverage))
In your method the error is that you initiated your new dataframe with 0 rows and try to assign a single value to it. This is reflected by the error. You want to replace one row to a dataframe with 0 rows. This would work, though:
new_data <- data.frame(Class = classes, AC = NA, Coverage = NA)
for(class in classes){
new_data$Class <- class
dataClass <- subset(dataset, Class == class)
tenRows <- dataClass[1:3,]
coverageMean <- mean(tenRows$Final_Coverage)
acMean <- mean(tenRows$AC)
new_data$Coverage[classes == class] <- coverageMean
new_data$AC[classes == class] <- acMean
}
Upvotes: 2