Reputation: 25
I learning R for a ecological study and I am trying to write a function to create multiple matrices.
My data frame looks like:
df <- data.frame(Species = c("a", "b", "c", "a", "d", "a", "b", "c", "c", "a", "c", "b", "e"),
Count = c(2, 3, 1, 3, 4, 1, 2, 1, 1, 3, 2, 4, 1),
Haul = c(1, 1, 2, 2, 1, 3, 2, 3, 4, 1, 1, 2, 1),
Year = c(2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001))
Printed:
Species Count Haul Year
1 a 2 1 2000
2 b 3 1 2000
3 c 1 2 2000
4 a 3 2 2000
5 d 4 1 2000
6 a 1 3 2000
7 b 2 2 2000
8 c 1 3 2000
9 c 1 4 2000
10 a 3 1 2001
11 c 2 1 2001
12 b 4 2 2001
13 e 1 1 2001
I am looking to create a for loop that will produce and store matrices in a list. These matrices will be be based on the Haul and species in each year.
For example, I have been trying something like.
for (i in sort(unique(df$Year))) {
ncol <- sort(unique(unlist(df$Species)))
nrow <- sort(unique(unlist(subset(df, Year == i, select=c("Haul")))))
mat <- matrix(0, length(nrow), length(ncol),
dimnames = list(nrow, ncol))
mat[as.matrix(df[c("Haul", "Species")])] <- df$Count
This has not been working.
I am looking for a solution like
list[[1]]
[["2000"]] a b c d e
1 2 3 0 4 0
2 3 2 1 0 0
3 1 0 1 0 0
4 0 0 1 0 0
[["2001"]] a b c d e
1 3 0 2 0 1
2 0 4 0 0 0
the goal is to have the columns be the total number of species ever seen and the rows be the specific hauls for the year. Then the for loop will stack the matrices in a list.
The main thing I have tried is creating a zeroed matrix and trying to fill the data with an mat[as.matrix()]
function but I keep getting a subscript out of bound
error.
I have tried a lot of methods but I am only learning from what I can find online. Any help would be greatly appreciated. Thank you!
Upvotes: 1
Views: 58
Reputation: 816
It's not clear to me why you would want to do this as a list of matrices, especially when your original data is already tidy. If you're just looking to transform from long to wide data by Species, this should do it.
library(tidyverse)
df %>%
#spread Species from long to wide data
spread(key = Species, value = Count, fill = 0) %>%
#Make Year the first column
select(Year, everything()) %>%
#sort by Year and Haul
arrange(Year, Haul)
Year Haul a b c d e
2000 1 2 3 0 4 0
2000 2 3 2 1 0 0
2000 3 1 0 1 0 0
2000 4 0 0 1 0 0
2001 1 3 0 2 0 1
2001 2 0 4 0 0 0
Upvotes: 0
Reputation: 107642
Consider by
(function to split data frames by factor(s) to run processes on subsets) and table
(function to build contingency table of counts by combinations of factors). The end result is a named list of matrices.
matrix_list <- by(df, df$Year, function(sub) {
mat <- table(sub$Haul, sub$Species)
mat[as.matrix(sub[c("Haul", "Species")])] <- sub$Count
return(mat)
})
matrix_list$`2000`
# a b c d e
# 1 2 3 0 4 0
# 2 3 2 1 0 0
# 3 1 0 1 0 0
# 4 0 0 1 0 0
matrix_list$`2001`
# a b c d e
# 1 3 0 2 0 1
# 2 0 4 0 0 0
Upvotes: 2
Reputation: 160447
This suggestion uses tidyr::spread
, though it's feasible to do with in base R using reshape
.
out <- by(df, df$Year, function(a) tidyr::spread(a, Species, Count, fill=0))
out
# df$Year: 2000
# Haul Year a b c d
# 1 1 2000 2 3 0 4
# 2 2 2000 3 2 1 0
# 3 3 2000 1 0 1 0
# 4 4 2000 0 0 1 0
# --------------------------------------------------------------------------------------------
# df$Year: 2001
# Haul Year a b c e
# 1 1 2001 3 0 2 1
# 2 2 2001 0 4 0 0
Technically, the output is
class(out)
# [1] "by"
but that's just a glorified way of providing a by
-like printing output. To verify:
str(out)
# List of 2
# $ 2000:'data.frame': 4 obs. of 6 variables:
# ..$ Haul: num [1:4] 1 2 3 4
# ..$ Year: num [1:4] 2000 2000 2000 2000
# ..$ a : num [1:4] 2 3 1 0
# ..$ b : num [1:4] 3 2 0 0
# ..$ c : num [1:4] 0 1 1 1
# ..$ d : num [1:4] 4 0 0 0
# $ 2001:'data.frame': 2 obs. of 6 variables:
# ..$ Haul: num [1:2] 1 2
# ..$ Year: num [1:2] 2001 2001
# ..$ a : num [1:2] 3 0
# ..$ b : num [1:2] 0 4
# ..$ c : num [1:2] 2 0
# ..$ e : num [1:2] 1 0
# - attr(*, "dim")= int 2
# - attr(*, "dimnames")=List of 1
# ..$ df$Year: chr [1:2] "2000" "2001"
# - attr(*, "call")= language by.data.frame(data = df, INDICES = df$Year, FUN = function(a) tidyr::spread(a, Species, Count, fill = 0))
# - attr(*, "class")= chr "by"
So we can just override the class with
class(out) <- "list"
out
# $`2000`
# Haul Year a b c d
# 1 1 2000 2 3 0 4
# 2 2 2000 3 2 1 0
# 3 3 2000 1 0 1 0
# 4 4 2000 0 0 1 0
# $`2001`
# Haul Year a b c e
# 1 1 2001 3 0 2 1
# 2 2 2001 0 4 0 0
# attr(,"call")
# by.data.frame(data = df, INDICES = df$Year, FUN = function(a) tidyr::spread(a,
# Species, Count, fill = 0))
I kept Year
in there for simplicity and demonstration (in case you might want to keep it around for some reason), but it's just as easy to remove with:
out <- by(df, df$Year, function(a) tidyr::spread(subset(a, select=-Year), Species, Count, fill=0))
(Since I've already brought in one of the tidyverse
with tidyr
, I could easily have used dplyr::select(a, -Year) instead of the
subset` call. Over to you and whichever tools you are using.)
I admit now that this is producing data.frame
s, not matrices. It'd take a little more code to convert the result for each one to a proper matrix.
df2m <- function(x) {
# assume first column should be row names
rn <- x[[1]]
out <- as.matrix(x[-1])
rownames(out) <- rn
out
}
lapply(out, df2m)
# $`2000`
# a b c d
# 1 2 3 0 4
# 2 3 2 1 0
# 3 1 0 1 0
# 4 0 0 1 0
# $`2001`
# a b c e
# 1 3 0 2 1
# 2 0 4 0 0
Upvotes: 2