giraffe
giraffe

Reputation: 13

R data.table conditional (min/max) aggregation

I'm relatively new to R and I have a question regarding how to do conditional aggregation using data.tables (or other methods) while still accessing the table columns by reference. There was an answer to a similar question here but it takes a long time on my data and takes a lot of memory. Here is some toy data:

t <- data.table(User=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3),
  Obs=c(1,2,3,4,5,1,2,3,4,1,2,3,4,5,6),
  Flag=c(0,1,0,1,0,0,1,0,0,1,0,0,0,1,0))

Which looks like this:

    User Obs Flag
1:     1   1    0
2:     1   2    1
3:     1   3    0
4:     1   4    1
5:     1   5    0
6:     2   1    0
7:     2   2    1
8:     2   3    0
9:     2   4    0
10:    3   1    1
11:    3   2    0
12:    3   3    0
13:    3   4    0
14:    3   5    1
15:    3   6    0

What I would like to do with this is to get the maximum observation less than the current observation where the flag is 1, by user. The output should look like this:

    User Obs Flag min.max
1:     1   1    0     NA
2:     1   2    1      2
3:     1   3    0      2
4:     1   4    1      4
5:     1   5    0      4
6:     2   1    0     NA
7:     2   2    1      2
8:     2   3    0      2
9:     2   4    0      2
10:    3   1    1      1
11:    3   2    0      1
12:    3   3    0      1
13:    3   4    0      1
14:    3   5    1      5
15:    3   6    0      5

Any help would be greatly appreciated!

Upvotes: 1

Views: 604

Answers (1)

eddi
eddi

Reputation: 49448

t[, max := Obs[Flag == 1], by = .(User, cumsum(diff(c(0, Flag)) == 1))]
t
#    User Obs Flag max
# 1:    1   1    0  NA
# 2:    1   2    1   2
# 3:    1   3    0   2
# 4:    1   4    1   4
# 5:    1   5    0   4
# 6:    2   1    0  NA
# 7:    2   2    1   2
# 8:    2   3    0   2
# 9:    2   4    0   2
#10:    3   1    1   1
#11:    3   2    0   1
#12:    3   3    0   1
#13:    3   4    0   1
#14:    3   5    1   5
#15:    3   6    0   5

Upvotes: 3

Related Questions