Reputation: 3082
table .For this data table I have a column for balance and I want to create a column based on changes in balance column. To mock up the data
set.seed(1)
zzz <- data.table(name=rep(c('x','y','z'),100),
balance=sample(100:300,100,replace = T), key = "name")
Now I have changed few datas to mock up what I want to do
zzz[5:7,2] <-238
zzz[20:22,2]<- 204
as I want to make some balance same in order to show what I intend to achieve ,
Now my code to achive a flag for change in balance for each name is as follows
cur_bal <- -99999999
bal_change_flag <- function(x){
#cur_bal <- -99999999
flag <- 1
if(x !=cur_bal){
flag <- 1
}else {flag <- 0}
cur_bal <<-x
return (flag)
}
zzz[,code:=lapply(balance,bal_change_flag),name]
Now I expect zzz[5:7,2:3,with=F]
as
balance code
1: 238 1
2: 238 0
3: 238 0
but I get it as
balance code
1: 238 1
2: 238 1
3: 238 1
the function works fine if I dont group it by name but fails to achieve the same when I group it by name .Please advise :) thanks
> zzz[,code:=lapply(balance,bal_change_flag)]
> zzz[5:7,2:3,with=F]
balance code
1: 238 1
2: 238 0
3: 238 0
Upvotes: 1
Views: 76
Reputation: 34703
I'm not sure what you're trying to do, but I guess you're trying to create a flag for whether the balance changed in each row?
I think you've made a poor example. Perhaps this is better:
set.seed(3040)
data <- data.table(name = rep(c("x","y","z"), 100),
balance = sample(3, 100, TRUE), key = "name")
I think what you're after is as simple as:
data[ , code := balance == shift(balance, fill = TRUE), by = name]
With output:
> data
name balance code
1: x 2 TRUE #first obs. within group always TRUE thanks to 'fill'
2: x 3 TRUE #3 != 2
3: x 1 TRUE #1 != 3
4: x 1 FALSE #1 == 1
5: x 2 TRUE
---
296: z 3 TRUE
297: z 3 FALSE
298: z 2 TRUE
299: z 1 TRUE
300: z 1 FALSE
You could force this to be 1
and 0
by using as.integer
or +
, but why?
Upvotes: 3
Reputation: 10473
Dplyr is much more intuitive and easy in my opinion. Using it here, you can do something like this:
library(dplyr)
> zzz <- data.frame(name=rep(c('x','y','z'),100),balance=sample(100:300,100,replace = T))
> zzz <- arrange(zzz, name)
> zzz[5:7,2] <-238
> zzz[20:22,2]<- 204
> zzz <- zzz %>% group_by(name) %>% mutate(code = as.numeric(balance != lag(balance)))
> zzz[5:7, ]
Source: local data frame [3 x 3]
Groups: name [1]
name balance code
(fctr) (dbl) (dbl)
1 x 238 1
2 x 238 0
3 x 238 0
> zzz[20:22, ]
Source: local data frame [3 x 3]
Groups: name [1]
name balance code
(fctr) (dbl) (dbl)
1 x 204 1
2 x 204 0
3 x 204 0
>
Upvotes: 1