Reputation: 65
I have some data similar to below data
Time output
2 1
2 1
2 2
2 2
2 1
2 2
2 1
I need to add two columns in this data:
index
: whenever output==2
it should count and the count remains same until it encounters 1 now if it encounters another 2 it should increase it count. total time
: It should sum time when output==2
between 1.Expected output:
Time output index total_time
2 1 0 0
2 1 0 0
2 2 1 4
2 2 1 4
2 1 0 0
2 2 2 2
2 1 0 0
Thank you in Advance.
Upvotes: 1
Views: 116
Reputation: 887118
Here is an option using data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), create the 'index' using rleid
on the logical vector (output == 2
), when 'index' is not 0, assign 'index' as the match
between the elements in 'index' and unique
values, create the 'total_time', grouped by 'index' where 'index' is not 0, if needed the NA
elements can be converted to 0.
library(data.table)
setDT(df1)[, index:= rleid(output ==2)*(output ==2)
][index!=0, index := match(index, unique(index))
][index!=0, total_time :=sum(Time) , index
][is.na(total_time), total_time := 0]
df1
# Time output index total_time
#1: 2 1 0 0
#2: 2 1 0 0
#3: 2 2 1 4
#4: 2 2 1 4
#5: 2 1 0 0
#6: 2 2 2 2
#7: 2 1 0 0
Upvotes: 1
Reputation: 93813
Using some indexing and stuffing about:
dat[c("index","total_time")] <- 0
hit <- dat$output==2
dat$index[hit] <- c(factor(cumsum(!hit)[hit]))
dat$total_time[hit] <- with(dat[hit,], ave(output, index, FUN=sum))
# Time output index total_time
#1 2 1 0 0
#2 2 1 0 0
#3 2 2 1 4
#4 2 2 1 4
#5 2 1 0 0
#6 2 2 2 2
#7 2 1 0 0
Upvotes: 2
Reputation: 121568
here a solution based on rle
and cumsum
. I am adding comments to explain major steps even it is very hard to explain it in words. The solution is vectorized without any loop.
## init the vectors results with zeros
dx$index <- rep(0,nrow(dx))
dx$total_time <- rep(0,nrow(dx))
## use rle to get the position/length
rr <- rle(dx$output)
## only the val 2 is important for us , so we store into index
ii <- rr$values==2
## we replace the occuronce of 2 in the original vector by the cumulative
## repeating it : hard to explain !!
vals <- cumsum(ii)[ii]
occurs <- rr$len[ii]
dx$index[dx$output==2] <- rep(vals,occurs)
## same thing for the total just we change the value here
dx$total_time[dx$output==2] <- rep(occurs*2,occurs)
# Time output index total_time
# 1 2 1 0 0
# 2 2 1 0 0
# 3 2 2 1 4
# 4 2 2 1 4
# 5 2 1 0 0
# 6 2 2 2 2
# 7 2 1 0 0
where dx is read as:
dx <- read.table(text=" Time output
2 1
2 1
2 2
2 2
2 1
2 2
2 1",header=T)
Upvotes: 2