amy
amy

Reputation: 65

Indexing on condition with R

I have some data similar to below data

      Time output
       2     1
       2     1
       2     2
       2     2
       2     1
       2     2
       2     1

I need to add two columns in this data:

Expected output:

     Time output  index  total_time
       2     1      0         0
       2     1      0         0 
       2     2      1         4
       2     2      1         4
       2     1      0         0 
       2     2      2         2
       2     1      0         0

Thank you in Advance.

Upvotes: 1

Views: 116

Answers (3)

akrun
akrun

Reputation: 887118

Here is an option using data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), create the 'index' using rleid on the logical vector (output == 2), when 'index' is not 0, assign 'index' as the match between the elements in 'index' and unique values, create the 'total_time', grouped by 'index' where 'index' is not 0, if needed the NA elements can be converted to 0.

library(data.table)
setDT(df1)[, index:= rleid(output ==2)*(output ==2)
          ][index!=0, index := match(index, unique(index))
           ][index!=0, total_time :=sum(Time) , index
            ][is.na(total_time), total_time := 0]
df1
#   Time output index total_time
#1:    2      1     0          0
#2:    2      1     0          0
#3:    2      2     1          4
#4:    2      2     1          4
#5:    2      1     0          0
#6:    2      2     2          2
#7:    2      1     0          0

Upvotes: 1

thelatemail
thelatemail

Reputation: 93813

Using some indexing and stuffing about:

dat[c("index","total_time")] <- 0
hit <- dat$output==2
dat$index[hit] <- c(factor(cumsum(!hit)[hit]))
dat$total_time[hit] <- with(dat[hit,], ave(output, index, FUN=sum))

#  Time output index total_time
#1    2      1     0          0
#2    2      1     0          0
#3    2      2     1          4
#4    2      2     1          4
#5    2      1     0          0
#6    2      2     2          2
#7    2      1     0          0

Upvotes: 2

agstudy
agstudy

Reputation: 121568

here a solution based on rle and cumsum. I am adding comments to explain major steps even it is very hard to explain it in words. The solution is vectorized without any loop.

## init the vectors results with zeros
dx$index <- rep(0,nrow(dx))
dx$total_time <- rep(0,nrow(dx))
## use rle to get the position/length  
rr <- rle(dx$output)
## only the val 2 is important for us , so we store into index
ii <- rr$values==2
## we replace the occuronce of 2 in the original vector by the cumulative 
## repeating it : hard to explain !! 
vals <- cumsum(ii)[ii]
occurs <- rr$len[ii]
dx$index[dx$output==2] <- rep(vals,occurs)
## same thing for the total just we change the value here
dx$total_time[dx$output==2] <- rep(occurs*2,occurs)

#     Time  output index     total_time
# 1    2      1     0          0
# 2    2      1     0          0
# 3    2      2     1          4
# 4    2      2     1          4
# 5    2      1     0          0
# 6    2      2     2          2
# 7    2      1     0          0

where dx is read as:

dx <- read.table(text=" Time output
       2     1
           2     1
           2     2
           2     2
           2     1
           2     2
           2     1",header=T)

Upvotes: 2

Related Questions