Alvaro Morales
Alvaro Morales

Reputation: 1925

How to compare every row of dataframe to dataframe in R?

I want to get the number of values that are equal to every other row in dataframe:

library(tidyverse)

df <- tibble(
  a = c(1, 1, 5, 1),
  b = c(2, 3, 2, 8),
  c = c(2, 6, 2, 2)
)

desired output:

# A tibble: 4 x 4
      a     b     c desired_column
  <dbl> <dbl> <dbl> <list>        
1     1     2     2 <dbl [4]>     
2     1     3     6 <dbl [4]>     
3     5     2     2 <dbl [4]>     
4     1     8     2 <dbl [4]> 

enter image description here

In the column "desired_column": firt row: 3, 1, 2, 2:

3: is because the first row has the same three values compared to itself

1: is because there is one value with the same value in both rows and same column (first and second):

enter image description here

2: There are two values that are equal in first and third row and same column :

enter image description here

2: There are two values that are equal in first and fourth row and same column :

enter image description here

The second, third and fourth row of "desired_column" are results of the same process: The ith number in the result is the number of values in common between the current row and the ith row

Upvotes: 1

Views: 1794

Answers (3)

awaji98
awaji98

Reputation: 685

Another approach is to make a function using a couple of for loops:

count_combs <- function(df){
output <- list()
vector <- NULL
for(i in 1:nrow(df)){
  for(j in 1:nrow(df)){
  vector[j] <- sum(df[i,] %in% df[j,])
}
output[[i]] <- vector
}
return(output)
}

df$desired_column<- count_combs(df)

Here the count_combs function counts the combinations of each row iterated once by i and again by j, summing up each time the row elements are %in% the comparison row.

Upvotes: 0

Jon Spring
Jon Spring

Reputation: 66445

My approach was to join the data to itself, to make a table comparing each value to the values of that column in each original row. Then we count the matches and pivot wider again.

df %>%
  rowid_to_column() %>%
  pivot_longer(-rowid) -> df2

left_join(df2, df2, by = "name") %>%
  count(rowid.x, rowid.y, wt = value.x == value.y) %>%     # Edit - shorter
  pivot_wider(names_from = rowid.y, values_from = n) %>%
  nest(desired_column = c(`1`:`4`)) %>%
  select(-rowid.x) -> matches

bind_cols(df, matches)


# A tibble: 4 x 4
      a     b     c desired_column  
  <dbl> <dbl> <dbl> <list>          
1     1     2     2 <tibble [1 × 4]>
2     1     3     6 <tibble [1 × 4]>
3     5     2     2 <tibble [1 × 4]>
4     1     8     2 <tibble [1 × 4]>


> matches %>%
+   unnest(cols = c(desired_column))
# A tibble: 4 x 4
    `1`   `2`   `3`   `4`
  <int> <int> <int> <int>
1     3     1     2     2
2     1     3     0     1
3     2     0     3     1
4     2     1     1     3

Upvotes: 3

Hong
Hong

Reputation: 594

You can do this: in short, with each row of the dataframe, duplicate it to create a new dataframe with all values changed to that row, and compare that dataframe with the original (whether the values are the same). rowSums of each of that comparison will give you the vectors you want.

# Create the desired output in list 
lst <- 
  lapply(1:nrow(df), function(nr) {
     rowSums(replicate(nrow(df), df[nr, ], simplify = FALSE) %>% 
             do.call("rbind", .) == df)})

# To create the desired dataframe
df %>% tibble(desired_column = I(lst))

In tibble call in the last row, I() is used to put in list output as a column.

Upvotes: 0

Related Questions