David Z
David Z

Reputation: 7041

How to extract paired rows from a data frame in R

I have a large data frame where most of subjects have a pair of observations such like that:

set.seed(123)
df<-data.frame(ID=c(letters[1:4],letters[1:6]),x=sample(1:5,10,T))
    ID x
1   a 2
2   b 4
3   c 3
4   d 5
5   a 5
6   b 1
7   c 3
8   d 5
9   e 3
10  f 3

I'd extract the rows that all IDs are paired such as:

  ID x
1  a 2
5  a 5
2  b 4
6  b 1
3  c 3
7  c 3
4  d 5
8  d 5

What's the best way to do that in R?

Upvotes: 0

Views: 252

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193537

You can use ave to get the length of each value in df$ID and use that to subset your data.frame:

out <- df[as.numeric(ave(as.character(df$ID), df$ID, FUN = length)) == 2, ]
out
#   ID x
# 1  a 2
# 2  b 4
# 3  c 3
# 4  d 5
# 5  a 5
# 6  b 1
# 7  c 3
# 8  d 5

Use order to sort the output if required.

out[order(out$ID), ]

You can also look into using data.table:

dt <- data.table(df, key = "ID") # Also sorts the output
dt[, n := .N, by = "ID"][n == 2]

Upvotes: 2

joran
joran

Reputation: 173577

Alternatively, I tend to use duplicated:

> df[df$ID %in% df$ID[duplicated(df$ID)],]
  ID x
1  a 2
2  b 1
3  c 5
4  d 5
5  a 4
6  b 2
7  c 3
8  d 4

Upvotes: 2

Related Questions