Reputation: 53
I would like to keep duplicate columns, and delete columns that are unique. The columns would have same values, but different names.
x1 = rnorm(1:10)
x2 = rnorm(1:10)
x3 = x1
x4 = rnorm(1:10)
x5 = x2
x6 = rnorm(1:10)
x7 = rnorm(1:10)
df = data.frame(x1,x2,x3,x4,x5,x6,x7)
From here I would keep columns x1, x2, x3, and x5.
There is also a similar question for python: Get rows that have the same value across its columns in pandas
Upvotes: 2
Views: 282
Reputation: 93813
Use duplicated
on a transposed version of your data, since the function by default checks for duplication of rows, not columns.
df[duplicated(t(df)) | duplicated(t(df), fromLast=TRUE)]
# x1 x2 x3 x5
#1 1.82633666 1.2271611 1.82633666 1.2271611
#2 -1.33187496 0.9654359 -1.33187496 0.9654359
#...
As @Frank notes, you could also have df
be treated like a list
of vector
s -
df[duplicated(c(df)) | duplicated(c(df), fromLast=TRUE)]
Or you could explicitly call the array
method, specifying columns to be checked for duplicates:
df[duplicated.array(df, MARGIN=2) | duplicated.array(df, MARGIN=2, fromLast=TRUE)]
Upvotes: 5