Diana
Diana

Reputation: 31

How to write a function in R that can make out the average of the 3 best scores out of 4

I am given a dataframe with 10 students, each one having a score for 4 different tests. i must select the 3 best scores and make their average using these 3

           noma interro1 interro2 interro3 interro4
1  836016120449        6        3       NA        3
2  596844884419        1        4        2        8
3  803259953398        2        2        9        1
4  658786759629        3        1        3        2
5  571155022756        4        9        1        4
6  576037886365        8        7        8        7
7  045086625199        9        6        7        6
8  621909979467        5        8        4        5
9  457029205538        7        5        6        9
10 402526220817       NA       10        5       10

This dataframe provides the scores for 4 tests for 10 students. Write a function that calculates the average score for the 3 best tests. Calculate this average score for the 10 students.

average <- function(t){
  x <-  sort(t, decreasing = TRUE)[1:3]
  return(mean(x, na.rm=TRUE))
}

apply(interro2, 1, average)

considering i want the 3 best, i thought that sort() could be useful here, however, what i receive is

In mean.default(x, na.rm = TRUE) :
  argument is not numeric or logical: returning NA

i tried this one too

average <- function(t){
  rowMeans(sort(t, decreasing = TRUE, na.rm=TRUE)[1:3])
}

UPDATE: answered, the dimensions of the dataframe were not correct in the apply line, i had to remove the first one which contained the names of the students, thus this one bellow works

average <- function(t){
  x <-  sort(t, decreasing = TRUE)[1:3]
  return(mean(x, na.rm=TRUE))
}

apply(interro2[-1], 1, average)

Upvotes: 1

Views: 81

Answers (1)

juanbarq
juanbarq

Reputation: 399

Try pivot the scores, then sort the scores by name and keep the top 3 scores. Finally take the average grouping by name:

library(dplyr)
library(tidyr)

data <- data.frame(
  stringsAsFactors = FALSE,
  noma = c("836016120449","596844884419",
           "803259953398","658786759629","571155022756",
           "576037886365","045086625199","621909979467","457029205538",
           "402526220817"),
  interro1 = c(6L, 1L, 2L, 3L, 4L, 8L, 9L, 5L, 7L, NA),
  interro2 = c(3L, 4L, 2L, 1L, 9L, 7L, 6L, 8L, 5L, 10L),
  interro3 = c(NA, 2L, 9L, 3L, 1L, 8L, 7L, 4L, 6L, 5L),
  interro4 = c(3L, 8L, 1L, 2L, 4L, 7L, 6L, 5L, 9L, 10L)
)

data <- data %>% pivot_longer(!noma, names_to = "interro", values_to = "value") %>% replace_na(list(value=0))

data_new1 <- data[order(data$noma, data$value, decreasing = TRUE), ]  # Order data descending
data_new1 <- Reduce(rbind, by(data_new1, data_new1["noma"], head, n = 3)) # Top N highest values by group

data_new1 <- data_new1 %>% group_by(noma) %>% summarise(Value_mean = mean(value))

Upvotes: 1

Related Questions