Reputation: 85
I have a dataframe in R with "N" rows that looks like this:
My idea is write a loop in R (perhaps containing "IF" and "NEXT") that will skip the next row with an ID whenever if finds a Value = 0 after a Value = 1. For instance, in the case of ID1 I'd only save up to its 4th row (Value = 1) and skip the remaining (Value = 0) and then go to ID10 where I'd save up to its 2nd row (Value = 1) and skip all others and go to ID2 where I'd save only the first (Value = 1) and skip the others and then go to ID3 and so forth so on.
Currently, I have something that looks like this loop:
Any ideas on how I can accomplish that?
Thanks a bunch, Diego.
Upvotes: 0
Views: 1664
Reputation: 886998
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'IDs', we get the index of the first maximum value in 'value' (which.max
), find the sequence of it (i.e. if we get 5 as the first 1, then it will be 1:5), wrap it with .I
to get the row index, extract the column ($V1
) and subset the dataset.
library(data.table)
setDT(df1)[df1[, .I[seq(which.max(value))], by = IDS]$V1]
It is not clear if a particular 'IDs' have only 0s for 'value'. If we need to skip that 'IDs', use an if
condition
setDT(df1)[df1[, if(any(value!=0)) .I[seq(which.max(value))], by = IDs]$V1]
To understand the syntax we can split up the process
setDT(df1) #converts the `data.frame` to `data.table`
Now, we can do the process to get the rowids. In the below code, we group by 'IDs', get the index of the first element that is 1 for 'value' (which.max(value)
), then do the sequence (seq(...)
), and find the row index in the whole dataset (.I[...]
).
df1[, .I[seq(which.max(value))] , by = IDs]
The above a dataset with columns 'IDs' and a default column 'V1' for the row index (as we didn't specify the column name)
If we are extracting that column, use $V1
or [["V1"]]
i1 <- df1[, .I[seq(which.max(value))] , by = IDs]$V1
The row index ('i1') object is used to subset the rows of the initial dataset
df1[i1]
Upvotes: 1