Neal Barsch
Neal Barsch

Reputation: 2940

Generate group by condition on row value in column R data.table

I want to split a data.table in R into groups based on a condition in the value of a row. I have searched SO extensively and can't find an efficient data.table way to do this (I'm not looking to for loop across rows)

I have data like this:

library(data.table)
dt1 <- data.table( x=1:139, t=c(rep(c(1:5),10),120928,rep(c(6:10),9), 10400,rep(c(13:19),6)))

I'd like to group at the large numbers (over a settable value) and come up with the example below:

dt.desired <- data.table( x=1:139, t=c(rep(c(1:5),10),120928,rep(c(6:10),9), 10400,rep(c(13:19),6)), group=c(rep(1,50),rep(2,46),rep(3,43)))

Upvotes: 0

Views: 177

Answers (2)

DanY
DanY

Reputation: 6073

You can use a test like t>100 to find the large values. You can then use cumsum() to get a running integer for each set of rows up to (but not including) the large number.

# assuming you can define "large" as >100
dt1[ , islarge := t>100]
dt1[ , group := shift(cumsum(islarge))]

I understand that you want the large number to be part of the group above it. To do this, use shift() and then fill in the first value (which will be NA after shift() is run.

# a little cleanup 
# (fix first value and start group at 1 instead of 0)
dt1[1, group := 0]
dt1[ , group := group+1]

Upvotes: 0

dayne
dayne

Reputation: 7794

dt1[ , group := cumsum(t > 200) + 1]

dt1[t > 200]
#     x      t group
# 1: 51 120928     2
# 2: 97  10400     3
dt.desired[t > 200]
#     x      t group
# 1: 51 120928     2
# 2: 97  10400     3

Upvotes: 3

Related Questions