Yung Gud
Yung Gud

Reputation: 87

Remove ID:s with only one observation in time in r

Hi I have panel data and would like to remove any individuals that only have observations at one time point and keep the ones that have 2 points in time.

so the dataframe:

df <- data.frame(id = c(1,2,2,3,3,4,4,5,6), time = c(1,1,2,1,2,1,2,2,2))

   id time
1  1    1
2  2    1
3  2    2
4  3    1
5  3    2
6  4    1
7  4    2
8  5    2
9  6    2

becomes this:

   id time
1  2    1
2  2    2
3  3    1
4  3    2
5  4    1
6  4    2

i.e removing individual 1, 5 and 6 so that the panel is balansed. Thx

Upvotes: 1

Views: 1594

Answers (4)

user5947301
user5947301

Reputation:

library(data.table)
setDT(df, key = "id")[(duplicated(id) | duplicated(id, fromLast = TRUE))]
#   id time
#1:  2    1
#2:  2    2
#3:  3    1
#4:  3    2
#5:  4    1
#6:  4    2

Upvotes: 1

akrun
akrun

Reputation: 887421

We can do this using a couple of options. With data.table, convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'id', we get the number of rows (.N) and if that is greater than 1, get the Subset of Data.table (.SD)

library(data.table)
setDT(df)[, if(.N>1) .SD, by = id]
#   id time
#1:  2    1
#2:  2    2
#3:  3    1
#4:  3    2
#5:  4    1
#6:  4    2

Can use the same methodology with dplyr.

library(dplyr)
df %>% 
   group_by(id) %>%
   filter(n()>1)
#    id  time
#   (dbl) (dbl)
#1     2     1
#2     2     2
#3     3     1
#4     3     2
#5     4     1
#6     4     2

Or with base R, get the table of data.frame, check whether it is greater than 1, subset the names based on the logical index ('i1') and use it to subset the 'data.frame' using %in%.

 i1 <- table(df$id)>1
 subset(df, id %in% names(i1)[i1] )

Upvotes: 3

Sotos
Sotos

Reputation: 51592

Another option,

ind <- rle(df$id)$values[rle(df$id)$lengths > 1]
df[df$id %in% ind,]
#  id time
#2  2    1
#3  2    2
#4  3    1
#5  3    2
#6  4    1
#7  4    2

Upvotes: 1

Koundy
Koundy

Reputation: 5503

You can use dplyr package to do this

 library(dplyr)
    df %>% group_by(id,time) %>% summarize(count = n()) %>%
           filter(!count == 1)

Upvotes: 0

Related Questions