Reputation: 31
I am given a dataframe with 10 students, each one having a score for 4 different tests. i must select the 3 best scores and make their average using these 3
noma interro1 interro2 interro3 interro4
1 836016120449 6 3 NA 3
2 596844884419 1 4 2 8
3 803259953398 2 2 9 1
4 658786759629 3 1 3 2
5 571155022756 4 9 1 4
6 576037886365 8 7 8 7
7 045086625199 9 6 7 6
8 621909979467 5 8 4 5
9 457029205538 7 5 6 9
10 402526220817 NA 10 5 10
This dataframe provides the scores for 4 tests for 10 students. Write a function that calculates the average score for the 3 best tests. Calculate this average score for the 10 students.
average <- function(t){
x <- sort(t, decreasing = TRUE)[1:3]
return(mean(x, na.rm=TRUE))
}
apply(interro2, 1, average)
considering i want the 3 best, i thought that sort() could be useful here, however, what i receive is
In mean.default(x, na.rm = TRUE) :
argument is not numeric or logical: returning NA
i tried this one too
average <- function(t){
rowMeans(sort(t, decreasing = TRUE, na.rm=TRUE)[1:3])
}
UPDATE: answered, the dimensions of the dataframe were not correct in the apply line, i had to remove the first one which contained the names of the students, thus this one bellow works
average <- function(t){
x <- sort(t, decreasing = TRUE)[1:3]
return(mean(x, na.rm=TRUE))
}
apply(interro2[-1], 1, average)
Upvotes: 1
Views: 81
Reputation: 399
Try pivot the scores, then sort the scores by name and keep the top 3 scores. Finally take the average grouping by name:
library(dplyr)
library(tidyr)
data <- data.frame(
stringsAsFactors = FALSE,
noma = c("836016120449","596844884419",
"803259953398","658786759629","571155022756",
"576037886365","045086625199","621909979467","457029205538",
"402526220817"),
interro1 = c(6L, 1L, 2L, 3L, 4L, 8L, 9L, 5L, 7L, NA),
interro2 = c(3L, 4L, 2L, 1L, 9L, 7L, 6L, 8L, 5L, 10L),
interro3 = c(NA, 2L, 9L, 3L, 1L, 8L, 7L, 4L, 6L, 5L),
interro4 = c(3L, 8L, 1L, 2L, 4L, 7L, 6L, 5L, 9L, 10L)
)
data <- data %>% pivot_longer(!noma, names_to = "interro", values_to = "value") %>% replace_na(list(value=0))
data_new1 <- data[order(data$noma, data$value, decreasing = TRUE), ] # Order data descending
data_new1 <- Reduce(rbind, by(data_new1, data_new1["noma"], head, n = 3)) # Top N highest values by group
data_new1 <- data_new1 %>% group_by(noma) %>% summarise(Value_mean = mean(value))
Upvotes: 1