Reputation: 87
Hi I have panel data and would like to remove any individuals that only have observations at one time point and keep the ones that have 2 points in time.
so the dataframe:
df <- data.frame(id = c(1,2,2,3,3,4,4,5,6), time = c(1,1,2,1,2,1,2,2,2))
id time
1 1 1
2 2 1
3 2 2
4 3 1
5 3 2
6 4 1
7 4 2
8 5 2
9 6 2
becomes this:
id time
1 2 1
2 2 2
3 3 1
4 3 2
5 4 1
6 4 2
i.e removing individual 1, 5 and 6 so that the panel is balansed. Thx
Upvotes: 1
Views: 1594
Reputation:
library(data.table)
setDT(df, key = "id")[(duplicated(id) | duplicated(id, fromLast = TRUE))]
# id time
#1: 2 1
#2: 2 2
#3: 3 1
#4: 3 2
#5: 4 1
#6: 4 2
Upvotes: 1
Reputation: 887421
We can do this using a couple of options. With data.table
, convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'id', we get the number of rows (.N
) and if
that is greater than 1, get the Subset of Data.table (.SD
)
library(data.table)
setDT(df)[, if(.N>1) .SD, by = id]
# id time
#1: 2 1
#2: 2 2
#3: 3 1
#4: 3 2
#5: 4 1
#6: 4 2
Can use the same methodology with dplyr
.
library(dplyr)
df %>%
group_by(id) %>%
filter(n()>1)
# id time
# (dbl) (dbl)
#1 2 1
#2 2 2
#3 3 1
#4 3 2
#5 4 1
#6 4 2
Or with base R
, get the table
of data.frame, check whether it is greater than 1, subset the names
based on the logical index ('i1') and use it to subset
the 'data.frame' using %in%
.
i1 <- table(df$id)>1
subset(df, id %in% names(i1)[i1] )
Upvotes: 3
Reputation: 51592
Another option,
ind <- rle(df$id)$values[rle(df$id)$lengths > 1]
df[df$id %in% ind,]
# id time
#2 2 1
#3 2 2
#4 3 1
#5 3 2
#6 4 1
#7 4 2
Upvotes: 1
Reputation: 5503
You can use dplyr
package to do this
library(dplyr)
df %>% group_by(id,time) %>% summarize(count = n()) %>%
filter(!count == 1)
Upvotes: 0