KT_1
KT_1

Reputation: 8494

Delete parts of dataframe when zeros are recorded for a set time period

I have a simple dataframe.

a <- c("06/12/2012 06:00","06/12/2012 06:05","06/12/2012 06:10","06/12/2012 06:15","06/12/2012 06:20","06/12/2012 06:25",
   "06/12/2012 06:30","06/12/2012 06:35","06/12/2012 06:40","06/12/2012 06:45","06/12/2012 06:50","06/12/2012 06:55",
   "06/12/2012 07:00","06/12/2012 07:05","06/12/2012 07:10","06/12/2012 07:15","06/12/2012 07:20","06/12/2012 07:25",
   "06/12/2012 07:30","06/12/2012 07:35","06/12/2012 07:40","06/12/2012 07:45","06/12/2012 07:50","06/12/2012 07:55",
   "06/12/2012 08:00")
a <- strptime(a, "%d/%m/%Y %H:%M")

b <-c("1","0","0","0","2","0","0","0","3","0","0","0","0","0","1","2","5","6","0","0","0","0","6","10","2")
df1 <- data.frame(a,b)

I want to use R to delete parts of my dataframe when there is insufficient valid data. Data is being recorded every 5 minutes. If there is 20 minutes or more of continuous data when only zeros are recorded in the 'b' column, these can be deleted from my final dataframe.

If anyone has any ideas to help me, I would very much appreciate it.

Upvotes: 2

Views: 81

Answers (2)

Arun
Arun

Reputation: 118869

One solution using rle (as Ben mentions under comments)

# get rle
t <- rle(as.numeric(as.character(df1$b)))
# check for condition. NOTE: here I assume all are 5 minute intervals!!
# So, if rle length >= 4, then its >= 20 minute interval
p <- which(t$values == 0 & t$lengths >= 4)
w <- cumsum(t$lengths)
o <- unlist(lapply(p, function(x) {
    c((w[x-1]+1):w[x])
}))
df1[-o, ]

#                      a  b
# 1  2012-12-06 06:00:00  1
# 2  2012-12-06 06:05:00  0
# 3  2012-12-06 06:10:00  0
# 4  2012-12-06 06:15:00  0
# 5  2012-12-06 06:20:00  2
# 6  2012-12-06 06:25:00  0
# 7  2012-12-06 06:30:00  0
# 8  2012-12-06 06:35:00  0
# 9  2012-12-06 06:40:00  3
# 15 2012-12-06 07:10:00  1
# 16 2012-12-06 07:15:00  2
# 17 2012-12-06 07:20:00  5
# 18 2012-12-06 07:25:00  6
# 23 2012-12-06 07:50:00  6
# 24 2012-12-06 07:55:00 10
# 25 2012-12-06 08:00:00  2

Upvotes: 2

flodel
flodel

Reputation: 89097

Another one, still using rle:

is.zero <- df1$b == 0
is.zero.rle <- rle(is.zero)
df1[rep(is.zero.rle$lengths, is.zero.rle$lengths) * is.zero < 4, ]

It might help understand if I show the intermediate results:

rep(is.zero.rle$lengths, is.zero.rle$lengths) * is.zero
# [1] 0 3 3 3 0 3 3 3 0 5 5 5 5 5 0 0 0 0 4 4 4 4 0 0 0

Upvotes: 3

Related Questions