user_n
user_n

Reputation: 53

Keep duplicate columns with different names. R

I would like to keep duplicate columns, and delete columns that are unique. The columns would have same values, but different names.

x1 = rnorm(1:10)
x2 = rnorm(1:10)
x3 = x1
x4 = rnorm(1:10)
x5 = x2
x6 = rnorm(1:10)
x7 = rnorm(1:10)
df = data.frame(x1,x2,x3,x4,x5,x6,x7)

From here I would keep columns x1, x2, x3, and x5.

There is also a similar question for python: Get rows that have the same value across its columns in pandas

Upvotes: 2

Views: 282

Answers (1)

thelatemail
thelatemail

Reputation: 93813

Use duplicated on a transposed version of your data, since the function by default checks for duplication of rows, not columns.

df[duplicated(t(df)) | duplicated(t(df), fromLast=TRUE)]

#            x1         x2          x3         x5
#1   1.82633666  1.2271611  1.82633666  1.2271611
#2  -1.33187496  0.9654359 -1.33187496  0.9654359
#...

As @Frank notes, you could also have df be treated like a list of vectors -

df[duplicated(c(df)) | duplicated(c(df), fromLast=TRUE)]

Or you could explicitly call the array method, specifying columns to be checked for duplicates:

df[duplicated.array(df, MARGIN=2) | duplicated.array(df, MARGIN=2, fromLast=TRUE)]

Upvotes: 5

Related Questions