Bg1850
Bg1850

Reputation: 3082

create a column in a data table based on changes on another column in R data.table

table .For this data table I have a column for balance and I want to create a column based on changes in balance column. To mock up the data

set.seed(1)
zzz  <- data.table(name=rep(c('x','y','z'),100),
                   balance=sample(100:300,100,replace = T), key = "name")

Now I have changed few datas to mock up what I want to do

zzz[5:7,2] <-238
zzz[20:22,2]<- 204

as I want to make some balance same in order to show what I intend to achieve ,

Now my code to achive a flag for change in balance for each name is as follows

cur_bal <- -99999999

bal_change_flag <- function(x){
  #cur_bal <- -99999999
  flag <- 1
  if(x !=cur_bal){
    flag <- 1
  }else {flag <- 0}
  cur_bal <<-x 
  return (flag)
}
zzz[,code:=lapply(balance,bal_change_flag),name]

Now I expect zzz[5:7,2:3,with=F]as

   balance code
1:     238    1
2:     238    0
3:     238    0

but I get it as

   balance code
1:     238    1
2:     238    1
3:     238    1

the function works fine if I dont group it by name but fails to achieve the same when I group it by name .Please advise :) thanks

> zzz[,code:=lapply(balance,bal_change_flag)]
> zzz[5:7,2:3,with=F]
   balance code
1:     238    1
2:     238    0
3:     238    0

Upvotes: 1

Views: 76

Answers (2)

MichaelChirico
MichaelChirico

Reputation: 34703

I'm not sure what you're trying to do, but I guess you're trying to create a flag for whether the balance changed in each row?

I think you've made a poor example. Perhaps this is better:

set.seed(3040)
data <- data.table(name = rep(c("x","y","z"), 100),
                   balance = sample(3, 100, TRUE), key = "name")

I think what you're after is as simple as:

data[ , code := balance == shift(balance, fill = TRUE), by = name]

With output:

> data
     name balance  code
  1:    x       2  TRUE #first obs. within group always TRUE thanks to 'fill'
  2:    x       3  TRUE #3 != 2
  3:    x       1  TRUE #1 != 3
  4:    x       1 FALSE #1 == 1
  5:    x       2  TRUE
 ---                   
296:    z       3  TRUE
297:    z       3 FALSE
298:    z       2  TRUE
299:    z       1  TRUE
300:    z       1 FALSE

You could force this to be 1 and 0 by using as.integer or +, but why?

Upvotes: 3

Gopala
Gopala

Reputation: 10473

Dplyr is much more intuitive and easy in my opinion. Using it here, you can do something like this:

library(dplyr)
> zzz  <- data.frame(name=rep(c('x','y','z'),100),balance=sample(100:300,100,replace = T))
> zzz <- arrange(zzz, name)
> zzz[5:7,2] <-238
> zzz[20:22,2]<- 204
> zzz <- zzz %>% group_by(name) %>% mutate(code = as.numeric(balance != lag(balance)))
> zzz[5:7, ]
Source: local data frame [3 x 3]
Groups: name [1]

    name balance  code
  (fctr)   (dbl) (dbl)
1      x     238     1
2      x     238     0
3      x     238     0
> zzz[20:22, ]
Source: local data frame [3 x 3]
Groups: name [1]

    name balance  code
  (fctr)   (dbl) (dbl)
1      x     204     1
2      x     204     0
3      x     204     0
> 

Upvotes: 1

Related Questions