Lakis
Lakis

Reputation: 25

Reproducing sumproduct in R into matrix

I have created the following data frame:

df <- data.frame(names1=c('dog','dog','mouse','cat','dog','cat','cat','dog','mouse','mouse'), names2=c('cat','dog','cat','cat','cat','dog','dog','dog','mouse','cat'), values=c(11,5,41,25,101,78,12,41,6,77))

and this following matrix which yields:

my_matrix <- matrix(data=0,nrow = length(unique(df$names1)),ncol = length(unique(df$names2))) rownames(my_matrix) <- c('dog','mouse','cat') colnames(my_matrix) <- c('dog','mouse','cat')

dog mouse cat
dog 0 0 0
mouse 0 0 0
cat 0 0 0

Now I want two things.

  1. Populate the empty matrix with the counts (number of element intersects) coming from df$names1 and df$names2

the result should be:

dog mouse cat
dog 2 0 2
mouse 0 1 2
cat 2 0 1

Populate the empty matrix with the sums (where element intersect) coming from df$names1 and df$names2

the result should be:

dog mouse cat
dog 46 0 112
mouse 0 6 118
cat 90 0 25

I have tried something like

for(i in 1:3){ for(j in 1:3){ my_matrix[i,j] <- sum(df$names1 == df$names2) } }

print(my_matrix)

but dont get the desired results

your help is much appreciated, may thanks

update:

thanks for the may comments, but does anybody have an idea how to do it with the for loop. like looping through each element of the matrix and then assigning the value accordingly? so far I have tested something like

for(i in 1:nrow(my_matrix)){
  for(j in 1:ncol(my_matrix)){
    
    my_matrix[i,j] <-  sum(df$values & df$names1[i] == df$names2[j] & df$names2[j] == df$names1[i]) 
  }
}

print(my_matrix)

but it does not work as desired

Upvotes: 1

Views: 70

Answers (1)

Merijn van Tilborg
Merijn van Tilborg

Reputation: 5887

updated answer with missing animals

original data

df <- data.frame(names1=c('dog','dog','mouse','cat','dog','cat','cat','dog','mouse','mouse'), names2=c('cat','dog','cat','cat','cat','dog','dog','dog','mouse','cat'), values=c(11,5,41,25,101,78,12,41,6,77))

we use data.table here and make it a data.table

library(data.table)
setDT(df)

define the animals that need to be included in the order you want

o <- c("cat", "dog", "mouse", "bear", "monkey")

now append your data with all animals and set their values to NA

df <- rbindlist(list(df, data.table(names1 = o, names2 = o, values = NA)))

to prevent counting NA's change the fun.aggregate

dcast(df, factor(names1, levels = o) ~ factor(names2, levels = o), fun.aggregate = function(x) length(na.omit(x)), value.var = "values")

   names1 cat dog mouse bear monkey
1:    cat   1   2     0    0      0
2:    dog   2   2     0    0      0
3:  mouse   2   0     1    0      0
4:   bear   0   0     0    0      0
5: monkey   0   0     0    0      0

dcast(df, factor(names1, levels = o) ~ factor(names2, levels = o), fun.aggregate = function(x) sum(na.omit(x)), value.var = "values")

   names1 cat dog mouse bear monkey
1:    cat  25  90     0    0      0
2:    dog 112  46     0    0      0
3:  mouse 118   0     6    0      0
4:   bear   0   0     0    0      0
5: monkey   0   0     0    0      0

my original answer

If the output being a matrix is not requirement, you can achieve it easily by going from "long to wide", I use data.table here with dcast.

library(data.table)

setDT(df)

dcast(df, names1 ~ names2, fun.aggregate = length)

#    names1 cat dog mouse
# 1:    cat   1   2     0
# 2:    dog   2   2     0
# 3:  mouse   2   0     1

dcast(df, names1 ~ names2, fun.aggregate = sum, value.var = "values")

#    names1 cat dog mouse
# 1:    cat  25  90     0
# 2:    dog 112  46     0
# 3:  mouse 118   0     6

updated example with ordering

o <- unique(df$names1)
dcast(df, factor(names1, levels = o) ~ factor(names2, levels = o), fun.aggregate = length)
dcast(df, factor(names1, levels = o) ~ factor(names2, levels = o), fun.aggregate = sum, value.var = "values")

Upvotes: 1

Related Questions