Reputation: 3384
Data
data=data.frame("student"=c(1,1,1,2,2,2,3,3,3),
"score"=c(NA,7,6,6,1,4,8,NA,3),
"min"=c(6,6,6,1,1,1,3,3,3),
"max"=c(7,7,7,6,6,6,8,8,8))
I have columns 'student' and 'score' and wish to use data.table to create 'min' and 'max' which simply put are the minimum and maximum values for each student IGNORING NA Values. If all values are NA then simply list 'NA' as the min/max.
Upvotes: 3
Views: 1583
Reputation: 25225
Another data.table
option:
setDT(data)[, c("min","max") := as.list(range(score, na.rm=TRUE)), student]
Upvotes: 1
Reputation: 18513
But OP wanted to return NA if all scores for any student were NA. This solution fixes the Inf problem.
data=data.frame("student"=c(1,1,1,2,2,2,3,3,3),
"score"=c(NA,NA,NA,6,1,4,8,NA,3))
> dt <- data.table(data); dt
student score
1: 1 NA
2: 1 NA
3: 1 NA
4: 2 6
5: 2 1
6: 2 4
7: 3 8
8: 3 NA
9: 3 3
Create a function to handle the case where all values are NA, to return NA
min.na = function(x) if (all(is.na(x))) x[NA_integer_] else min(x, na.rm = TRUE)
max.na = function(x) if (all(is.na(x))) x[NA_integer_] else max(x, na.rm = TRUE)
dt[, c("min", "max") := list(min.na(score), max.na(score)), by=student]
dt
student score min max
1: 1 NA NA NA
2: 1 NA NA NA
3: 1 NA NA NA
4: 2 6 1 6
5: 2 1 1 6
6: 2 4 1 6
7: 3 8 3 8
8: 3 NA 3 8
9: 3 3 3 8
Edit: And I'm not sure why you'd want to do this anyway. Combining summary statistics to the original data is bad practice. It results in redundancy/duplication. Surely you just want a separate result for each student:
dt[, .(min=min.na(score), max=max.na(score)), by=student]
student min max
1: 1 NA NA
2: 2 1 6
3: 3 3 8
I know this last part is not what was asked, but I always check that what they ask for is what they really wanted. ;)
Upvotes: 3
Reputation: 388807
Using data.table
library(data.table)
setDT(data)
data[, c("min", "max"):= list(min(score, na.rm = TRUE),
max(score, na.rm = TRUE)), student]
data
# student score min max
#1: 1 NA 6 7
#2: 1 7 6 7
#3: 1 6 6 7
#4: 2 6 1 6
#5: 2 1 1 6
#6: 2 4 1 6
#7: 3 8 3 8
#8: 3 NA 3 8
#9: 3 3 3 8
Or with dplyr
library(dplyr)
data %>%
group_by(student) %>%
mutate(min = min(score, na.rm = TRUE), max = max(score, na.rm = TRUE))
Upvotes: 3
Reputation: 9313
You can do this using the function ave
:
data=data.frame("student"=c(1,1,1,2,2,2,3,3,3),
"score"=c(NA,7,6,6,1,4,8,NA,3))
data$min = ave(data$score, data$student, FUN = function(x){ min(x, na.rm = T) })
data$max = ave(data$score, data$student, FUN = function(x){ max(x, na.rm = T) })
Result:
> data
student score min max
1 1 NA 6 7
2 1 7 6 7
3 1 6 6 7
4 2 6 1 6
5 2 1 1 6
6 2 4 1 6
7 3 8 3 8
8 3 NA 3 8
9 3 3 3 8
The function ave
takes a numeric vector as the first parameter and all following vectors are the grouping variables. The FUN parameter is the function you wish to apply.
Upvotes: 0