Diego Menezes
Diego Menezes

Reputation: 85

How to "IF" and "NEXT" in R to skip rows in a dataframe column?

I have a dataframe in R with "N" rows that looks like this:

enter image description here

My idea is write a loop in R (perhaps containing "IF" and "NEXT") that will skip the next row with an ID whenever if finds a Value = 0 after a Value = 1. For instance, in the case of ID1 I'd only save up to its 4th row (Value = 1) and skip the remaining (Value = 0) and then go to ID10 where I'd save up to its 2nd row (Value = 1) and skip all others and go to ID2 where I'd save only the first (Value = 1) and skip the others and then go to ID3 and so forth so on.

Currently, I have something that looks like this loop:

enter image description here

Any ideas on how I can accomplish that?

Thanks a bunch, Diego.

Upvotes: 0

Views: 1664

Answers (1)

akrun
akrun

Reputation: 886998

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'IDs', we get the index of the first maximum value in 'value' (which.max), find the sequence of it (i.e. if we get 5 as the first 1, then it will be 1:5), wrap it with .I to get the row index, extract the column ($V1) and subset the dataset.

library(data.table)
setDT(df1)[df1[, .I[seq(which.max(value))], by = IDS]$V1]

It is not clear if a particular 'IDs' have only 0s for 'value'. If we need to skip that 'IDs', use an if condition

setDT(df1)[df1[, if(any(value!=0)) .I[seq(which.max(value))], by = IDs]$V1]

To understand the syntax we can split up the process

setDT(df1) #converts the `data.frame` to `data.table`

Now, we can do the process to get the rowids. In the below code, we group by 'IDs', get the index of the first element that is 1 for 'value' (which.max(value)), then do the sequence (seq(...)), and find the row index in the whole dataset (.I[...]).

df1[, .I[seq(which.max(value))] , by = IDs]

The above a dataset with columns 'IDs' and a default column 'V1' for the row index (as we didn't specify the column name)

If we are extracting that column, use $V1 or [["V1"]]

i1 <- df1[, .I[seq(which.max(value))] , by = IDs]$V1

The row index ('i1') object is used to subset the rows of the initial dataset

df1[i1]

Upvotes: 1

Related Questions