Arturo Sbr
Arturo Sbr

Reputation: 6333

Flag an entire group if a single row meets a condition using data.table

I have the following sample data:

> so <- data.table(Credit_id = rep(c("1-A", "17-F", "2-D"), each = 3), Period = rep(1:3, times = 3), Due_days = c(0,0,0, 0,30,0, 0,30,60))
> so
   Credit_id Period Due_days
1:       1-A      1        0
2:       1-A      2        0
3:       1-A      3        0
4:      17-F      1        0
5:      17-F      2       30
6:      17-F      3        0
7:       2-D      1        0
8:       2-D      2       30
9:       2-D      3       60

The data shows how three different credits performed during their first three months in a portfolio. Credit_id is the main key, Period is a time index and Due_days shows how many days a client was overdue in a given period.

I want to create a new column Flag which can take two values: 0 and 1. Flag should take the value 1 if a credit (which is grouped by Credit_id) was ever equal to or greater than 30.

This is the result I want to get to:

   Credit_id Period Due_days Flag
1:       1-A      1        0    0
2:       1-A      2        0    0
3:       1-A      3        0    0
4:      17-F      1        0    1
5:      17-F      2       30    1
6:      17-F      3        0    1
7:       2-D      1        0    1
8:       2-D      2       30    1
9:       2-D      3       60    1

That is, assign a 1 to the groups who had at least one row where Due_days >= 30.

Upvotes: 0

Views: 708

Answers (1)

tmfmnk
tmfmnk

Reputation: 40171

You can do:

dt[, flag := +(any(Due_days >= 30)), by = Credit_id]

   Credit_id Period Due_days flag
1:       1-A      1        0    0
2:       1-A      2        0    0
3:       1-A      3        0    0
4:      17-F      1        0    1
5:      17-F      2       30    1
6:      17-F      3        0    1
7:       2-D      1        0    1
8:       2-D      2       30    1
9:       2-D      3       60    1

Or the same with base R:

with(dt, ave(Due_days, Credit_id, FUN = function(x) +(any(x >= 30))))

any() tests whether at least one value per group fulfills the condition. As @Calum You already noted, + is just a quick way to transform a logical vector into a vector of integers.

To illustrate the use of +:

+(c(TRUE, FALSE))
[1] 1 0

Other possibilities are:

c(TRUE, FALSE) * 1

Or:

as.integer(c(TRUE, FALSE))

Upvotes: 3

Related Questions