Avijit Mallick
Avijit Mallick

Reputation: 37

Using dplyr, finding if selected column values are matching with other column value in R data frame

I have a data frame, having more than 150 numbers of column, say for example

df <- data.frame(name = c("Andy", "Bob", "Andy", "Cha", "Andy", "Bob", "Dil"), 
                 num = c(1,NA, 0,NA, 1, NA, 0), 
                 place=c("Andy", "Bob", "Andy", "Cha", "Andy", "Bob", "Dil"))

It shows the data frame as

name num place
1 Andy   1  Andy
2  Bob  NA   Bob
3 Andy   0  Andy
4  Cha  NA   Cha
5 Andy   1  Andy
6  Bob  NA   Bob
7  Dil   0   Dil

Now I have seen that, although the variable names name and place are different, the values of column 1 (name) and column 3 (place) are same. There are 150 numbers of columns in my data frame. So I want to find out the variable which represents the similar type of information as the variable named name (column 1).

Upvotes: 0

Views: 685

Answers (3)

RLesur
RLesur

Reputation: 5910

If you want to test if two columns are strictly identical, use identical(), e.g.

purrr::map_lgl(df, ~ identical(., df$name))

You get:

 name   num place 
 TRUE FALSE  TRUE 

Upvotes: 0

Len Greski
Len Greski

Reputation: 10855

Expanding on Alistaire's comment, a complete solution extracting the duplicate and non-duplicate columns looks like this.

df <- data.frame(name = c("Andy", "Bob", "Andy", "Cha", "Andy", "Bob", "Dil"), 
                 num = c(1,NA, 0,NA, 1, NA, 0), 
                 place=c("Andy", "Bob", "Andy", "Cha", "Andy", "Bob", "Dil"),
                 num2 = c(1,NA, 0,NA, 1, NA, 0))
library(magrittr)
# duplicated columns
df[1,duplicated.default(df)] %>% names(.)
# non-duplicated columns
df[1,!duplicated.default(df)] %>% names(.)

...and the output:

> df[1,duplicated.default(df)] %>% names(.)
[1] "place" "num2" 
> df[1,!duplicated.default(df)] %>% names(.)
[1] "name" "num" 
>

Upvotes: 0

Bruno Vilela
Bruno Vilela

Reputation: 103

You can use the following code if you are interested in getting the positions of columns with the same values of column one:

which(apply(df[, -1], 2, function(x)all(x == df[, 1])))

Hope it will help you.

Upvotes: 0

Related Questions