Reputation: 6333
I have the following sample data:
> so <- data.table(Credit_id = rep(c("1-A", "17-F", "2-D"), each = 3), Period = rep(1:3, times = 3), Due_days = c(0,0,0, 0,30,0, 0,30,60))
> so
Credit_id Period Due_days
1: 1-A 1 0
2: 1-A 2 0
3: 1-A 3 0
4: 17-F 1 0
5: 17-F 2 30
6: 17-F 3 0
7: 2-D 1 0
8: 2-D 2 30
9: 2-D 3 60
The data shows how three different credits performed during their first three months in a portfolio. Credit_id
is the main key, Period
is a time index and Due_days
shows how many days a client was overdue in a given period.
I want to create a new column Flag
which can take two values: 0 and 1. Flag
should take the value 1 if a credit (which is grouped by Credit_id
) was ever equal to or greater than 30.
This is the result I want to get to:
Credit_id Period Due_days Flag
1: 1-A 1 0 0
2: 1-A 2 0 0
3: 1-A 3 0 0
4: 17-F 1 0 1
5: 17-F 2 30 1
6: 17-F 3 0 1
7: 2-D 1 0 1
8: 2-D 2 30 1
9: 2-D 3 60 1
That is, assign a 1
to the groups who had at least one row where Due_days >= 30
.
Upvotes: 0
Views: 708
Reputation: 40171
You can do:
dt[, flag := +(any(Due_days >= 30)), by = Credit_id]
Credit_id Period Due_days flag
1: 1-A 1 0 0
2: 1-A 2 0 0
3: 1-A 3 0 0
4: 17-F 1 0 1
5: 17-F 2 30 1
6: 17-F 3 0 1
7: 2-D 1 0 1
8: 2-D 2 30 1
9: 2-D 3 60 1
Or the same with base R
:
with(dt, ave(Due_days, Credit_id, FUN = function(x) +(any(x >= 30))))
any()
tests whether at least one value per group fulfills the condition. As @Calum You already noted, +
is just a quick way to transform a logical vector into a vector of integers.
To illustrate the use of +
:
+(c(TRUE, FALSE))
[1] 1 0
Other possibilities are:
c(TRUE, FALSE) * 1
Or:
as.integer(c(TRUE, FALSE))
Upvotes: 3