Reputation: 1925
I want to get the number of values that are equal to every other row in dataframe:
library(tidyverse)
df <- tibble(
a = c(1, 1, 5, 1),
b = c(2, 3, 2, 8),
c = c(2, 6, 2, 2)
)
desired output:
# A tibble: 4 x 4
a b c desired_column
<dbl> <dbl> <dbl> <list>
1 1 2 2 <dbl [4]>
2 1 3 6 <dbl [4]>
3 5 2 2 <dbl [4]>
4 1 8 2 <dbl [4]>
In the column "desired_column": firt row: 3, 1, 2, 2:
3: is because the first row has the same three values compared to itself
1: is because there is one value with the same value in both rows and same column (first and second):
2: There are two values that are equal in first and third row and same column :
2: There are two values that are equal in first and fourth row and same column :
The second, third and fourth row of "desired_column" are results of the same process:
The i
th number in the result is the number of values in common between the current row and the i
th row
Upvotes: 1
Views: 1794
Reputation: 685
Another approach is to make a function using a couple of for loops:
count_combs <- function(df){
output <- list()
vector <- NULL
for(i in 1:nrow(df)){
for(j in 1:nrow(df)){
vector[j] <- sum(df[i,] %in% df[j,])
}
output[[i]] <- vector
}
return(output)
}
df$desired_column<- count_combs(df)
Here the count_combs function counts the combinations of each row iterated once by i and again by j, summing up each time the row elements are %in% the comparison row.
Upvotes: 0
Reputation: 66445
My approach was to join the data to itself, to make a table comparing each value to the values of that column in each original row. Then we count the matches and pivot wider again.
df %>%
rowid_to_column() %>%
pivot_longer(-rowid) -> df2
left_join(df2, df2, by = "name") %>%
count(rowid.x, rowid.y, wt = value.x == value.y) %>% # Edit - shorter
pivot_wider(names_from = rowid.y, values_from = n) %>%
nest(desired_column = c(`1`:`4`)) %>%
select(-rowid.x) -> matches
bind_cols(df, matches)
# A tibble: 4 x 4
a b c desired_column
<dbl> <dbl> <dbl> <list>
1 1 2 2 <tibble [1 × 4]>
2 1 3 6 <tibble [1 × 4]>
3 5 2 2 <tibble [1 × 4]>
4 1 8 2 <tibble [1 × 4]>
> matches %>%
+ unnest(cols = c(desired_column))
# A tibble: 4 x 4
`1` `2` `3` `4`
<int> <int> <int> <int>
1 3 1 2 2
2 1 3 0 1
3 2 0 3 1
4 2 1 1 3
Upvotes: 3
Reputation: 594
You can do this: in short, with each row of the dataframe, duplicate it to create a new dataframe with all values changed to that row, and compare that dataframe with the original (whether the values are the same). rowSums
of each of that comparison will give you the vectors you want.
# Create the desired output in list
lst <-
lapply(1:nrow(df), function(nr) {
rowSums(replicate(nrow(df), df[nr, ], simplify = FALSE) %>%
do.call("rbind", .) == df)})
# To create the desired dataframe
df %>% tibble(desired_column = I(lst))
In tibble
call in the last row, I()
is used to put in list output as a column.
Upvotes: 0