C_psy
C_psy

Reputation: 647

Subsetting data conditional on first instance in R

data:

row A B 
 1  1 1
 2  1 1
 3  1 2
 4  1 3
 5  1 1
 6  1 2
 7  1 3

Hi all! What I'm trying to do (example above) is to sum those values in column A, but only when column B = 1 (so starting with a simple subset line - below).

sum(data$A[data$B==1])

However, I only want to do this the first time that condition occurs until the values switch. If that condition re-occurs later in the column (row 5 in the example), I'm not interested in it!

I'd really appreciate your help in this (I suspect simple) problem!

Upvotes: 4

Views: 929

Answers (3)

mnel
mnel

Reputation: 115382

Using data.table for syntax elegance, you can use rle to get this done

library(data.table)
DT <- data.table(data)
DT[ ,B1 := {
  bb <- rle(B==1)
  r <- bb$values
  r[r] <- seq_len(sum(r))
  bb$values <- r
  inverse.rle(bb)
} ]

DT[B1 == 1, sum(a)]
# [1] 2

Upvotes: 1

Arun
Arun

Reputation: 118779

Another way:

idx <- which(data$B == 1)
sum(data$A[idx[idx == (seq_along(idx) + idx[1] - 1)]])
# [1] 2

# or alternatively
sum(data$A[idx[idx == seq(idx[1], length.out = length(idx))]])
# [1] 2

The idea: First get all indices of 1. Here it's c(2,3,5). From the start index = "2", you want to get all the indices that are continuous (or consecutive, that is, c(2,3,4,5...)). So, from 2 take that many consecutive numbers and equate them. They'll not be equal the moment they are not continuous. That is, once there's a mismatch, all the other following numbers will also have a mismatch. So, the first few numbers for which the match is equal will only be the ones that are "consecutive" (which is what you desire).

Upvotes: 1

eddi
eddi

Reputation: 49448

Here's a rather elaborate way of doing that:

data$counter = cumsum(data$B == 1)
sum(data$A[(data$counter >= 1:nrow(data) - sum(data$counter == 0)) &
           (data$counter != 0)])

Upvotes: 1

Related Questions