Reputation: 37
I have a data frame, having more than 150 numbers of column, say for example
df <- data.frame(name = c("Andy", "Bob", "Andy", "Cha", "Andy", "Bob", "Dil"),
num = c(1,NA, 0,NA, 1, NA, 0),
place=c("Andy", "Bob", "Andy", "Cha", "Andy", "Bob", "Dil"))
It shows the data frame as
name num place
1 Andy 1 Andy
2 Bob NA Bob
3 Andy 0 Andy
4 Cha NA Cha
5 Andy 1 Andy
6 Bob NA Bob
7 Dil 0 Dil
Now I have seen that, although the variable names name
and place
are different, the values of column 1 (name
) and column 3 (place
) are same. There are 150 numbers of columns in my data frame. So I want to find out the variable which represents the similar type of information as the variable named name
(column 1).
Upvotes: 0
Views: 685
Reputation: 5910
If you want to test if two columns are strictly identical, use identical()
, e.g.
purrr::map_lgl(df, ~ identical(., df$name))
You get:
name num place
TRUE FALSE TRUE
Upvotes: 0
Reputation: 10855
Expanding on Alistaire's comment, a complete solution extracting the duplicate and non-duplicate columns looks like this.
df <- data.frame(name = c("Andy", "Bob", "Andy", "Cha", "Andy", "Bob", "Dil"),
num = c(1,NA, 0,NA, 1, NA, 0),
place=c("Andy", "Bob", "Andy", "Cha", "Andy", "Bob", "Dil"),
num2 = c(1,NA, 0,NA, 1, NA, 0))
library(magrittr)
# duplicated columns
df[1,duplicated.default(df)] %>% names(.)
# non-duplicated columns
df[1,!duplicated.default(df)] %>% names(.)
...and the output:
> df[1,duplicated.default(df)] %>% names(.)
[1] "place" "num2"
> df[1,!duplicated.default(df)] %>% names(.)
[1] "name" "num"
>
Upvotes: 0
Reputation: 103
You can use the following code if you are interested in getting the positions of columns with the same values of column one:
which(apply(df[, -1], 2, function(x)all(x == df[, 1])))
Hope it will help you.
Upvotes: 0