rgeorge
rgeorge

Reputation: 31

Extract rows with duplicated values in one column only if corresponding values in another column are also duplicated in r

I am trying to extract rows ONLY IF they have duplicated values in the first and second columns (x1 and x2). In other words, extracting duplicated rows in the first column(x1) ONLY IF the corresponding rows in the second column (x2) are all duplicates.

dt

x1 x2 x3
1 a 2.1
1 a 3.4
1 b 4
2 c 5.5
2 c 4.1
2 d 5
3 e 2.4
3 e 7
4 f 1.5
4 f 4.4
4 f 2.1
5 g 7.8

I tried to use:

dupe = dt[,c('x1','x2')] 
dt[duplicated(dupe) | duplicated(dupe, fromLast=TRUE),]

However the results are different to what I want. My desired database SHOULD EXCLUDE X1=1 because corresponding x2 a=a≠b, the same applies for x1=2 (corresponding x2 are not ALL duplicates c=c≠d).

x1 x2 x3
1 a 2.1
1 a 3.4
2 c 5.5
2 c 4.1
3 e 2.4
3 e 7
4 f 1.5
4 f 4.4
4 f 2.2

My DESIRED database should include the following:

x1 x2 x3
3 e 2.4
3 e 7
4 f 1.5
4 f 4.4
4 f 2.2

Any solutions please?

Upvotes: 2

Views: 552

Answers (1)

Onyambu
Onyambu

Reputation: 79188

df[duplicated(df[-3])|duplicated(df[-3], fromLast = TRUE), ]


   x1 x2  x3
1   1  a 2.1
2   1  a 3.4
4   2  c 5.5
5   2  c 4.1
7   3  e 2.4
8   3  e 7.0
9   4  f 1.5
10  4  f 4.4
11  4  f 2.1

Upvotes: 2

Related Questions