Reputation: 177
I have a large .csv file containing the results of recent large-scale forest surveys, in which each row contains a given individual tree's location, species identity, and measured cross-sectional area. I read this .csv into RStudio using fread()
to produce a data.table
. I want to collapse this large data.table
into a matrix
such that each row corresponds with a location, each column corresponds with a single species, and each cell contains the sum of all cross-sectional areas of that species at that location.
Below is a dummy data.table
in the format of my data, as copied from the console. Values in cells are summed values from column x-sect area
in raw.input
.
> raw.input <- fread("raw_input.csv")
> raw.input
site sp x-sect area
1: hilltop sp2 10
2: hilltop sp1 3
3: hilltop sp1 5
4: hilltop sp1 4
5: hilltop sp1 3
6: stream sp3 45
7: stream sp3 50
8: stream sp1 4
Below is a matrix
in my desired format, generated as a .csv is MS Excel, read in using fread()
, and converted to a matrix
in RStudio.
> mtrx.tmp <- fread("mtrx_final.csv")
> mtrx <- as.matrix(mtrx.tmp[,2:4]) #remove character strings so matrix is numeric
> row.names(mtrx) <- mtrx.tmp$site #mtrx.tmp$site is equivalent to mtrx.tmp[,1] in content
> mtrx
sp1 sp2 sp3
hilltop 15 10 0
stream 4 0 95
If a data.table
is an inappropriate/inefficient format in which to read in this data set please do include that in your answer.
Upvotes: 1
Views: 427
Reputation: 8506
You can use dcast
from data.table
for that (and data.table
is perfectly suited for this task):
library(data.table)
raw.input <- structure(list(site = c("hilltop", "hilltop", "hilltop", "hilltop",
"hilltop", "stream", "stream", "stream"), sp = c("sp2", "sp1",
"sp1", "sp1", "sp1", "sp3", "sp3", "sp1"), `x-sect area` = c(10L,
3L, 5L, 4L, 3L, 45L, 50L, 4L)), row.names = c(NA, -8L), class = c("data.table",
"data.frame"))
dcast(raw.input, site ~ sp, value.var="x-sect area", fun.aggregate = sum) |>
as.matrix(, rownames=1)
#> sp1 sp2 sp3
#> hilltop 15 10 0
#> stream 4 0 95
Created on 2022-07-27 by the reprex package (v2.0.1)
Upvotes: 3
Reputation: 177
Partial answer - function aggregate()
performs the required location- and species-level summing of cross-sectional area.
> aggregate(raw.input$`x-sect area`,list(raw.input$site,raw.input$sp),FUN=sum)
Group.1 Group.2 x
1 hilltop sp1 15
2 stream sp1 4
3 hilltop sp2 10
4 stream sp3 95
Upvotes: 0