bvowe
bvowe

Reputation: 3384

Data Table Solution In R To Find Group Min/Max

Data

data=data.frame("student"=c(1,1,1,2,2,2,3,3,3),
                "score"=c(NA,7,6,6,1,4,8,NA,3),
                "min"=c(6,6,6,1,1,1,3,3,3),
                "max"=c(7,7,7,6,6,6,8,8,8))

I have columns 'student' and 'score' and wish to use data.table to create 'min' and 'max' which simply put are the minimum and maximum values for each student IGNORING NA Values. If all values are NA then simply list 'NA' as the min/max.

Upvotes: 3

Views: 1583

Answers (4)

chinsoon12
chinsoon12

Reputation: 25225

Another data.table option:

setDT(data)[, c("min","max") := as.list(range(score, na.rm=TRUE)), student]

Upvotes: 1

Edward
Edward

Reputation: 18513

But OP wanted to return NA if all scores for any student were NA. This solution fixes the Inf problem.

data=data.frame("student"=c(1,1,1,2,2,2,3,3,3),
                "score"=c(NA,NA,NA,6,1,4,8,NA,3))
> dt <- data.table(data); dt
   student score
1:       1    NA
2:       1    NA
3:       1    NA
4:       2     6
5:       2     1
6:       2     4
7:       3     8
8:       3    NA
9:       3     3

Create a function to handle the case where all values are NA, to return NA

min.na = function(x) if (all(is.na(x))) x[NA_integer_] else min(x, na.rm = TRUE)
max.na = function(x) if (all(is.na(x))) x[NA_integer_] else max(x, na.rm = TRUE)

dt[, c("min", "max") := list(min.na(score), max.na(score)), by=student]
dt
   student score min max
1:       1    NA  NA  NA
2:       1    NA  NA  NA
3:       1    NA  NA  NA
4:       2     6   1   6
5:       2     1   1   6
6:       2     4   1   6
7:       3     8   3   8
8:       3    NA   3   8
9:       3     3   3   8

Edit: And I'm not sure why you'd want to do this anyway. Combining summary statistics to the original data is bad practice. It results in redundancy/duplication. Surely you just want a separate result for each student:

dt[, .(min=min.na(score), max=max.na(score)), by=student]
   student min max
1:       1  NA  NA
2:       2   1   6
3:       3   3   8

I know this last part is not what was asked, but I always check that what they ask for is what they really wanted. ;)

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388807

Using data.table

library(data.table)

setDT(data)
data[, c("min", "max"):= list(min(score, na.rm = TRUE), 
                              max(score,  na.rm = TRUE)), student]
data
#   student score min max
#1:       1    NA   6   7
#2:       1     7   6   7
#3:       1     6   6   7
#4:       2     6   1   6
#5:       2     1   1   6
#6:       2     4   1   6
#7:       3     8   3   8
#8:       3    NA   3   8
#9:       3     3   3   8

Or with dplyr

library(dplyr)
data %>%
  group_by(student) %>%
  mutate(min =  min(score, na.rm = TRUE), max = max(score, na.rm = TRUE))

Upvotes: 3

R. Schifini
R. Schifini

Reputation: 9313

You can do this using the function ave:

data=data.frame("student"=c(1,1,1,2,2,2,3,3,3),
                "score"=c(NA,7,6,6,1,4,8,NA,3))

data$min = ave(data$score, data$student, FUN = function(x){ min(x, na.rm = T) })
data$max = ave(data$score, data$student, FUN = function(x){ max(x, na.rm = T) })

Result:

> data
  student score min max
1       1    NA   6   7
2       1     7   6   7
3       1     6   6   7
4       2     6   1   6
5       2     1   1   6
6       2     4   1   6
7       3     8   3   8
8       3    NA   3   8
9       3     3   3   8

The function ave takes a numeric vector as the first parameter and all following vectors are the grouping variables. The FUN parameter is the function you wish to apply.

Upvotes: 0

Related Questions