Matrix from dataframe in R/Python

Question

I have a dataframe like this: If the person with that id has that fruit the value is 1, else it is 0. ID column is a primary key.

ID	Apple	Orange	Pear	Grapes
E1	1	0	1	1
E2	0	0	1	0
E3	0	1	1	0
E4	1	1	0	0
E5	1	0	0	1

I want a output as a matrix like this. This is a count of people who has both i and j fruit. Where i and j both are same, eg: cell apple x apple, we are looking at the count of all the people who has at least an apple apple. Here, 3 people have apple, so the value is 3. Similarly, we have only one person who has both apple and orange(E4) so the count is 1 in both (orange x apple) and (apple x orange) cells. We have 2 people in the above dataframe who has both grapes and apple(E1 and E5), so value in that cell is 2.

Edits: I want output like this

	Apple	Orange	Pear	Grapes
Apple	3	1	1	2
Orange	1	2	1	0
Pear	1	1	3	1
Grapes	2	0	1	2

I am new to R and Python, and not sure how to achieve this. Any help in either R or Python would be really appreciated! Please feel free to ask questions if anything is not clear or anything is missing. Thanks!

Maurits Evers · Accepted Answer

Here is a tidyverse R option:

library(tidyverse)
data %>%
    pivot_longer(-ID) %>%
    filter(value > 0) %>%
    select(-value) %>%
    group_by(ID) %>%
    nest() %>%
    mutate(data = map(data, ~ expand.grid(.x$name, .x$name))) %>%
    unnest(data) %>%
    group_by(Var1, Var2) %>%
    summarise(n = n(), .groups = "drop") %>%
    filter(Var1 != Var2) %>%
    pivot_wider(names_from = Var1, values_from = n, values_fill = 0)
## A tibble: 4 x 5
#  Var2   Apple  Pear Grapes Orange
#         
#1 Pear       1     0      1      1
#2 Grapes     2     1      0      0
#3 Orange     1     1      0      0
#4 Apple      0     1      2      1

The idea is to use expand.grid to generate all pairwise combinations. The rest is counting occurrences and reshaping. Re-order rows & columns as necessary.

PS. I should mention that the result is slightly different from the expected output. For example, I don't understand why (Pear, Pear) = 1. I assume this in an error. Since you're asking for co-occurrences, all diagonal elements should be zero.

Sample data

data <- read.table(text = "ID   Apple   Orange  Pear    Grapes
E1  1   0   1   1
E2  0   0   1   0
E3  0   1   1   0
E4  1   1   0   0
E5  1   0   0   1", header = T)

Matrix from dataframe in R/Python

Answers (2)

Sample data

Related Questions