bvowe
bvowe

Reputation: 3394

R DataTable Select Rows

data1=data.frame("Student"=c(1, 1, 1, 2, 2, 2, 3, 3, 3),
"Grade"=c(1, 2, 3, 1, 2, 3, 1, 2, 3),
"Score"=c(5, 7, 9, 2, 2, 3, 10, NA, 3))


data2=data.frame("Student"=c(1, 1, 1, 3, 3, 3),
"Grade"=c(1, 2, 3, 1, 2, 3),
"Score"=c(5, 7, 9, 10, NA, 3))

I have 'data1' and wish for 'data2' where I ONLY include 'Student' if 'Score' at 'Grade' = 1 is at least 4.

My only knowledge of how to do this is doing it by 'Grade' and 'Score' but that does not give desired output.

library(data.table)
setDT(data1)
data1=data1[Grade==1 & Score >=4)

how is it possible to specify that I wish to select all STUDENTS who have a Score>=4 at Grade 1 and not just the ROWS

Upvotes: 1

Views: 81

Answers (1)

Francesco Grossetti
Francesco Grossetti

Reputation: 1595

You just need to do a join with your desired conditions to retain the Student id.

Does this work?

library(data.table)
data1 <- data.frame("Student"=c(1, 1, 1, 2, 2, 2, 3, 3, 3),
                    "Grade"=c(1, 2, 3, 1, 2, 3, 1, 2, 3),
                    "Score"=c(5, 7, 9, 2, 2, 3, 10, NA, 3))
data2 <- data.frame("Student"=c(1, 1, 1, 3, 3, 3),
                    "Grade"=c(1, 2, 3, 1, 2, 3),
                    "Score"=c(5, 7, 9, 10, NA, 3))
setDT(data1)
setDT(data2)

wanted <- data1[ Grade == 1 & Score >= 4, .( Student ) ]

setkey( wanted, Student )
setkey( data1, Student )

data3 = data1[ wanted ]

data2
#>    Student Grade Score
#> 1:       1     1     5
#> 2:       1     2     7
#> 3:       1     3     9
#> 4:       3     1    10
#> 5:       3     2    NA
#> 6:       3     3     3
data3
#>    Student Grade Score
#> 1:       1     1     5
#> 2:       1     2     7
#> 3:       1     3     9
#> 4:       3     1    10
#> 5:       3     2    NA
#> 6:       3     3     3

Created on 2020-04-29 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions