Kevin
Kevin

Reputation: 61

Referencing to other row in data.table

I have a data.table of the following structure:

> head(DF)
                                         X X00000001 Rit
1: %100,11410,      ,001,002,Nachtnettrein         1   1
2: %100,01412,      ,002,007,Nachtnettrein         0   0
3:                          -00002,000,999         0   0
4:                           &IC  ,001,007         0   0
5:                           >rtd    ,2532         0   0
6:                      ?11   ,11   ,00002         0   0

When column X00000001 != 0, column "Rit" should take the value of the adjacent X00000001 column. When column X00000001 == 0 it should take on the value of column "Rit"-1 (the previous row). I was able to do this when I still had a data frame using:

for(i in nrow(DF)) {

  ifelse(DF$X00000001[i] == 0, DF$Rit[i] <- DF$Rit[i-1], DF$Rit[i] <- DF$X00000001[i])
}

But because it is a big dataset, I had to convert to data.table. Although the method above does still work, it is painstakingly slow. I tried to look around for the data.table way to do this and came to this:

  DF[X00000001 == 0, Rit := 2]
  DF[X00000001 != 0, Rit := X00000001]

The != 0 case works just fine. In the == 0 case I used 2 as a dummy variable and it worked just fine. The thing is that instead of 2, I want it to take on the value of the row before it. In pseudocode something like this:

  DF[X00000001 == 0, Rit := Rit[n-1]
  DF[X00000001 != 0, Rit := X00000001]

Any help would be appreciated, thanks.

Upvotes: 1

Views: 70

Answers (1)

tmfmnk
tmfmnk

Reputation: 39858

Using dplyr:

df %>%
  mutate(Rit = ifelse(X00000001 != 0, X00000001,
                    ifelse(X00000001 == 0, lag(Rit), NA)))

Or by using data.table:

setDT(df)
df[, Rit := ifelse(X00000001 != 0, X00000001,
                   ifelse(X00000001 == 0, shift(Rit, 1, type = "lag"), NA))]

Data:

  df <- read.table(text = "X00000001 Rit
1   1
0   0
0   0
0   0
0   0
0   0", header = TRUE)

Upvotes: 2

Related Questions