Reputation: 97
How can I extract the column names (or row and column index) of duplicate element in next data frame?
V1 V2 V3 V4
PC1 0.5863431 0.5863431 3.952237e-01 3.952237e-01
PC2 -0.3952237 -0.3952237 5.863431e-01 5.863431e-01
PC3 -0.7071068 0.7071068 1.665335e-16 3.885781e-16
For example 0.5863431
is equal to 0.5863431
, so "V1"
and "V2"
are the column names.
In that dataframe I want to get:
[1] "V1" "V2" "V3" "V4"
As you can see, looking rather only the result of the first row.
Second example:
V1 V2 V3 V4
PC1 -0.5987139 -0.5987139 -0.03790446 0.5307039
PC2 -0.0189601 -0.0189601 -0.99315168 -0.1137136
PC3 0.3986891 0.3523926 -0.11045319 0.8394442
Result:
[1] "V1" "V2"
Upvotes: 6
Views: 1354
Reputation: 1445
With whatever approach you use, be aware of FAQ 7.31 when working with floating point numbers. You may want to create a new matrix where you have 'rounded' them to the same number of digits; though they may 'look' the same on the printout, there can be differences that you don't see in the trailing digits.
Upvotes: 1
Reputation: 99341
There may be a better way, but here's my take on it.
## coerce to matrix (if not already)
m <- as.matrix(df)
## find duplicates across both margins
d <- duplicated(m, MARGIN = 0) | duplicated(m, MARGIN = 0, fromLast = TRUE)
## grab the unique col names
colnames(m)[unique(col(d)[d])]
Examples: On your first data frame -
df1 <- read.table(text = "V1 V2 V3 V4
PC1 0.5863431 0.5863431 3.952237e-01 3.952237e-01
PC2 -0.3952237 -0.3952237 5.863431e-01 5.863431e-01
PC3 -0.7071068 0.7071068 1.665335e-16 3.885781e-16", header = TRUE)
m1 <- as.matrix(df1)
d1 <- duplicated(m1, MARGIN = 0) | duplicated(m1, MARGIN = 0, fromLast = TRUE)
colnames(m1)[unique(col(d1)[d1])]
# [1] "V1" "V2" "V3" "V4"
And on the second -
df2 <- read.table(text = "V1 V2 V3 V4
PC1 -0.5987139 -0.5987139 -0.03790446 0.5307039
PC2 -0.0189601 -0.0189601 -0.99315168 -0.1137136
PC3 0.3986891 0.3523926 -0.11045319 0.8394442", header = TRUE)
m2 <- as.matrix(df2)
d2 <- duplicated(m2, MARGIN = 0) | duplicated(m2, MARGIN = 0, fromLast = TRUE)
colnames(m2)[unique(col(d2)[d2])]
# [1] "V1" "V2"
Side note: Since your data contains all numeric values I would recommend beginning with a matrix instead of a data frame.
Upvotes: 8
Reputation: 17611
A slightly different approach using which
and apply
# convert to matrix
mat1 <- as.matrix(df1)
# find duplicates and store them
dups <- mat1[which(duplicated(c(mat1)))]
# identify columns containing a value in dups
names(which(apply(mat1, 2, function(x) any(x %in% dups))))
#[1] "V1" "V2" "V3" "V4"
mat2 <- as.matrix(df2)
dups <- mat2[which(duplicated(c(mat2)))]
names(which(apply(mat2, 2, function(x) any(x %in% dups))))
#[1] "V1" "V2"
Upvotes: 3