Jarrod Ver Steeg
Jarrod Ver Steeg

Reputation: 25

Using a for loop to fill out a matrix when column conditions are met in R

I learning R for a ecological study and I am trying to write a function to create multiple matrices.

My data frame looks like:

df <- data.frame(Species = c("a", "b", "c", "a", "d", "a", "b", "c", "c", "a", "c", "b", "e"),
             Count = c(2, 3, 1, 3, 4, 1, 2, 1, 1, 3, 2, 4, 1),
             Haul = c(1, 1, 2, 2, 1, 3, 2, 3, 4, 1, 1, 2, 1),
             Year = c(2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001))

Printed:

Species Count Haul Year
1        a     2    1 2000
2        b     3    1 2000
3        c     1    2 2000
4        a     3    2 2000
5        d     4    1 2000
6        a     1    3 2000
7        b     2    2 2000
8        c     1    3 2000
9        c     1    4 2000
10       a     3    1 2001
11       c     2    1 2001
12       b     4    2 2001
13       e     1    1 2001

I am looking to create a for loop that will produce and store matrices in a list. These matrices will be be based on the Haul and species in each year.

For example, I have been trying something like.

for (i in sort(unique(df$Year))) {
ncol <- sort(unique(unlist(df$Species)))
nrow <- sort(unique(unlist(subset(df, Year == i, select=c("Haul")))))
mat <- matrix(0, length(nrow), length(ncol),
              dimnames = list(nrow, ncol))
mat[as.matrix(df[c("Haul", "Species")])] <- df$Count

This has not been working.

I am looking for a solution like

list[[1]]
[["2000"]] a  b  c  d  e
         1 2  3  0  4  0
         2 3  2  1  0  0
         3 1  0  1  0  0
         4 0  0  1  0  0

[["2001"]] a  b  c  d  e 
         1 3  0  2  0  1  
         2 0  4  0  0  0

the goal is to have the columns be the total number of species ever seen and the rows be the specific hauls for the year. Then the for loop will stack the matrices in a list.

The main thing I have tried is creating a zeroed matrix and trying to fill the data with an mat[as.matrix()] function but I keep getting a subscript out of bound error.

I have tried a lot of methods but I am only learning from what I can find online. Any help would be greatly appreciated. Thank you!

Upvotes: 1

Views: 58

Answers (3)

Jordo82
Jordo82

Reputation: 816

It's not clear to me why you would want to do this as a list of matrices, especially when your original data is already tidy. If you're just looking to transform from long to wide data by Species, this should do it.

library(tidyverse)

df %>% 
  #spread Species from long to wide data
  spread(key = Species, value = Count, fill = 0) %>% 
  #Make Year the first column
  select(Year, everything()) %>% 
  #sort by Year and Haul
  arrange(Year, Haul)

Year Haul a b c d e
2000    1 2 3 0 4 0
2000    2 3 2 1 0 0
2000    3 1 0 1 0 0
2000    4 0 0 1 0 0
2001    1 3 0 2 0 1
2001    2 0 4 0 0 0

Upvotes: 0

Parfait
Parfait

Reputation: 107642

Consider by (function to split data frames by factor(s) to run processes on subsets) and table (function to build contingency table of counts by combinations of factors). The end result is a named list of matrices.

matrix_list <- by(df, df$Year, function(sub) {    
    mat <- table(sub$Haul, sub$Species)
    mat[as.matrix(sub[c("Haul", "Species")])] <- sub$Count

    return(mat)      
})

matrix_list$`2000`

#   a b c d e
# 1 2 3 0 4 0
# 2 3 2 1 0 0
# 3 1 0 1 0 0
# 4 0 0 1 0 0

matrix_list$`2001`

#   a b c d e
# 1 3 0 2 0 1
# 2 0 4 0 0 0

Upvotes: 2

r2evans
r2evans

Reputation: 160447

This suggestion uses tidyr::spread, though it's feasible to do with in base R using reshape.

out <- by(df, df$Year, function(a) tidyr::spread(a, Species, Count, fill=0))
out
# df$Year: 2000
#   Haul Year a b c d
# 1    1 2000 2 3 0 4
# 2    2 2000 3 2 1 0
# 3    3 2000 1 0 1 0
# 4    4 2000 0 0 1 0
# -------------------------------------------------------------------------------------------- 
# df$Year: 2001
#   Haul Year a b c e
# 1    1 2001 3 0 2 1
# 2    2 2001 0 4 0 0

Technically, the output is

class(out)
# [1] "by"

but that's just a glorified way of providing a by-like printing output. To verify:

str(out)
# List of 2
#  $ 2000:'data.frame': 4 obs. of  6 variables:
#   ..$ Haul: num [1:4] 1 2 3 4
#   ..$ Year: num [1:4] 2000 2000 2000 2000
#   ..$ a   : num [1:4] 2 3 1 0
#   ..$ b   : num [1:4] 3 2 0 0
#   ..$ c   : num [1:4] 0 1 1 1
#   ..$ d   : num [1:4] 4 0 0 0
#  $ 2001:'data.frame': 2 obs. of  6 variables:
#   ..$ Haul: num [1:2] 1 2
#   ..$ Year: num [1:2] 2001 2001
#   ..$ a   : num [1:2] 3 0
#   ..$ b   : num [1:2] 0 4
#   ..$ c   : num [1:2] 2 0
#   ..$ e   : num [1:2] 1 0
#  - attr(*, "dim")= int 2
#  - attr(*, "dimnames")=List of 1
#   ..$ df$Year: chr [1:2] "2000" "2001"
#  - attr(*, "call")= language by.data.frame(data = df, INDICES = df$Year, FUN = function(a) tidyr::spread(a, Species, Count, fill = 0))
#  - attr(*, "class")= chr "by"

So we can just override the class with

class(out) <- "list"
out
# $`2000`
#   Haul Year a b c d
# 1    1 2000 2 3 0 4
# 2    2 2000 3 2 1 0
# 3    3 2000 1 0 1 0
# 4    4 2000 0 0 1 0
# $`2001`
#   Haul Year a b c e
# 1    1 2001 3 0 2 1
# 2    2 2001 0 4 0 0
# attr(,"call")
# by.data.frame(data = df, INDICES = df$Year, FUN = function(a) tidyr::spread(a, 
#     Species, Count, fill = 0))

I kept Year in there for simplicity and demonstration (in case you might want to keep it around for some reason), but it's just as easy to remove with:

out <- by(df, df$Year, function(a) tidyr::spread(subset(a, select=-Year), Species, Count, fill=0))

(Since I've already brought in one of the tidyverse with tidyr, I could easily have used dplyr::select(a, -Year) instead of thesubset` call. Over to you and whichever tools you are using.)

I admit now that this is producing data.frames, not matrices. It'd take a little more code to convert the result for each one to a proper matrix.

df2m <- function(x) {
  # assume first column should be row names
  rn <- x[[1]]
  out <- as.matrix(x[-1])
  rownames(out) <- rn
  out
}
lapply(out, df2m)
# $`2000`
#   a b c d
# 1 2 3 0 4
# 2 3 2 1 0
# 3 1 0 1 0
# 4 0 0 1 0
# $`2001`
#   a b c e
# 1 3 0 2 1
# 2 0 4 0 0

Upvotes: 2

Related Questions