SimoneG
SimoneG

Reputation: 29

Conditional subset based on one occurrence

I would like to subset a database like this in two parts:

df <- data.frame(ID = c(1,1,1,1,1,2,2,2,3,3,3), 
ins =c(1,2,3,4,5,3,2,4,5,8,9),
Ytx = c(NA,NA,1998,NA,NA,NA,NA,NA,NA,2011,NA))

ID ins  Ytx
  1   1   NA
  1   2   NA
  1   3 1998
  1   4   NA
  1   5   NA
  2   3   NA
  2   2   NA
  2   4   NA
  3   5   NA
  3   8 2011
  3   9   NA

The first should look like this (all the values coming after the occurrence of Ytx grouped by ID):

 ID ins  Ytx
  1   3 1998
  1   4   NA
  1   5   NA
  3   8 2011
  3   9   NA

and the other one made by what is left:

ID ins  Ytx
  1   1   NA
  1   2   NA
  2   3   NA
  2   2   NA
  2   4   NA
  3   5   NA

Thank you for your support

Upvotes: 0

Views: 66

Answers (3)

Jaap
Jaap

Reputation: 83235

Two other alternatives with :

# convert 'df' tot a 'data.table'
library(data.table)
setDT(df)

# alternative 1
split(df, df[, !!cumsum(!is.na(Ytx)), by = ID]$V1)

# alternative 2
split(df, df[, !!Reduce(`+`, !is.na(Ytx), accumulate = TRUE), by = ID]$V1)

which both give:

$`FALSE`
   ID ins Ytx
1:  1   1  NA
2:  1   2  NA
3:  2   3  NA
4:  2   2  NA
5:  2   4  NA
6:  3   5  NA

$`TRUE`
   ID ins  Ytx
1:  1   3 1998
2:  1   4   NA
3:  1   5   NA
4:  3   8 2011
5:  3   9   NA

Upvotes: 2

chinsoon12
chinsoon12

Reputation: 25225

Here is an option using data.table:

setDT(df)[, rn := .I]
idx <- df[, 
    if (any(!is.na(Ytx))) 
        .I[seq(.N) >= match(TRUE, !is.na(Ytx))]
, ID]$V1

df[idx]:

   ID ins  Ytx rn
1:  1   3 1998  3
2:  1   4   NA  4
3:  1   5   NA  5
4:  3   8 2011 10
5:  3   9   NA 11

df[-idx]:

   ID ins Ytx rn
1:  1   1  NA  1
2:  1   2  NA  2
3:  2   3  NA  6
4:  2   2  NA  7
5:  2   4  NA  8
6:  3   5  NA  9

data:

library(data.table)
df <- data.frame(ID = c(1,1,1,1,1,2,2,2,3,3,3), 
    ins =c(1,2,3,4,5,3,2,4,5,8,9),
    Ytx = c(NA,NA,1998,NA,NA,NA,NA,NA,NA,2011,NA))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389012

In base R, we can use ave and split. This will divide data into list of two. First one is the data before value of Ytx and the second one is rows after Ytx.

split(df, with(df, ave(!is.na(Ytx), ID, FUN = cumsum)))
#Or if you may have multiple Ytx per ID
#split(df, with(df, ave(!is.na(Ytx), ID, FUN = cumsum)) > 0)

#$`0`
#  ID ins Ytx
#1  1   1  NA
#2  1   2  NA
#6  2   3  NA
#7  2   2  NA
#8  2   4  NA
#9  3   5  NA

#$`1`
#   ID ins  Ytx
#3   1   3 1998
#4   1   4   NA
#5   1   5   NA
#10  3   8 2011
#11  3   9   NA

Upvotes: 2

Related Questions