Reputation: 1295
I have a data frame
A B C D E F
1 2 3 3 4 1
2 3 5 5 8 2
6 4 9 9 0 6
1 2 3 3 8 1
I want the names of columns that holds identical values.
Preferred output : A,F,C,D
Upvotes: 2
Views: 332
Reputation: 33603
You could convert the data.frame
to a list
and use duplicated()
:
names(df)[duplicated(as.list(df)) | duplicated(as.list(df), fromLast = TRUE)]
# [1] "A" "C" "D" "F"
You could also duplicated.default()
directly on the data.frame
:
names(df)[duplicated.default(df) | duplicated.default(df, fromLast = TRUE)]
# [1] "A" "C" "D" "F"
Data:
df <- data.frame(
A = c(1L, 2L, 6L, 1L), B = c(2L, 3L, 4L, 2L), C = c(3L, 5L, 9L, 3L),
D = c(3L, 5L, 9L, 3L), E = c(4L, 8L, 0L, 8L), F = c(1L, 2L, 6L, 1L)
)
Benchmark:
Converting a data.frame
to a list (as.list()
) is much more efficient than transposing and converting to a matrix (t()
):
microbenchmark::microbenchmark(as.list(df), t(df))
Unit: microseconds
expr min lq mean median uq max neval cld
as.list(df) 2.677 2.9010 3.84244 3.570 3.5700 28.114 100 a
t(df) 69.615 71.1765 77.11636 72.293 75.6395 219.554 100 b
Upvotes: 1
Reputation: 7467
Expanding @Ronak Shah's solution to produce OPs preferred output:
df <- data.frame(A = c(1,2,6,1), B = c(2,3,4,2), C = c(3,5,9,3), D = c(3,5,9,3), E = c(4,8,0,8), F = c(1,2,6,1))
df <- df[, duplicated(t(df)) | duplicated(t(df), fromLast = TRUE)]
df <- df[order(df[1])]
names(df)
[1] "A" "F" "C" "D"
Note that order()
is used so rearrange df
so that names(df)
returns the preferred output.
Upvotes: 2
Reputation: 389235
duplicated
works on each row of data frame. We could transpose it to make it work for each column and subset the names of the columns.
names(df)[duplicated(t(df)) | duplicated(t(df), fromLast = TRUE)]
#[1] "A" "C" "D" "F"
Upvotes: 1