Biostats
Biostats

Reputation: 51

how to choose a row when the row number of that row is equal to the value of the other column with duplicates in R?

I have a data frame as follows -

df <- cbind(c(1,1,1,2,2,2,3,3,3,3), c(6,12,18,3,9,12,4,8,12,16),c(3,3,3,2,2,2,4,4,4,4))
colnames(df) <- c("ID","value","index")

I want to get the following result -

df1 <- cbind(c(1,2,3), c(18,9,16),c(3,2,4))

So I basically want to extract (for each ID) the row whose row number is equal to the index for that ID. For example, the 3rd row for ID 1, 2nd row for ID 2 and 4th row for ID 4.

I tried the following code

df1 <- df%>%group_by(ID)%>%filter(index==index)

But it is not working. Please help me to solve this problem.

Upvotes: 1

Views: 44

Answers (2)

Hong
Hong

Reputation: 594

Just adding to the Ronak Shah's answer, I guess one of the straightforward codes to do what you want is the following:

library(dplyr)
df <- 
    data.frame(ID = c(1,1,1,2,2,2,3,3,3,3), value = c(6,12,18,3,9,12,4,8,12,16), index = c(3,3,3,2,2,2,4,4,4,4))

df %>% group_by(ID) %>% filter(row_number() == index) %>% ungroup

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388807

Use slice to select the index row for each ID.

library(dplyr)
df %>% group_by(ID) %>% slice(first(index)) %>% ungroup

#     ID value index
#  <dbl> <dbl> <dbl>
#1     1    18     3
#2     2     9     2
#3     3    16     4

This can be written in data.table and base R as :

library(data.table)
setDT(df)[, .SD[first(index)], ID]

#Base R
subset(df, index == ave(value, ID, FUN = seq_along))

data

df <- data.frame(ID = c(1,1,1,2,2,2,3,3,3,3), 
                 value = c(6,12,18,3,9,12,4,8,12,16),
                 index = c(3,3,3,2,2,2,4,4,4,4))

Upvotes: 2

Related Questions