fre1990
fre1990

Reputation: 177

In R how do I sum values in a data.table column aggregated by two character columns, with matrix with colnames and rownames equal to strings output?

I have a large .csv file containing the results of recent large-scale forest surveys, in which each row contains a given individual tree's location, species identity, and measured cross-sectional area. I read this .csv into RStudio using fread() to produce a data.table. I want to collapse this large data.table into a matrix such that each row corresponds with a location, each column corresponds with a single species, and each cell contains the sum of all cross-sectional areas of that species at that location.

Below is a dummy data.table in the format of my data, as copied from the console. Values in cells are summed values from column x-sect area in raw.input.

> raw.input <- fread("raw_input.csv")
> raw.input
      site  sp x-sect area
1: hilltop sp2          10
2: hilltop sp1           3
3: hilltop sp1           5
4: hilltop sp1           4
5: hilltop sp1           3
6:  stream sp3          45
7:  stream sp3          50
8:  stream sp1           4

Below is a matrix in my desired format, generated as a .csv is MS Excel, read in using fread(), and converted to a matrix in RStudio.

> mtrx.tmp <- fread("mtrx_final.csv")
> mtrx <- as.matrix(mtrx.tmp[,2:4]) #remove character strings so matrix is numeric
> row.names(mtrx) <- mtrx.tmp$site  #mtrx.tmp$site is equivalent to mtrx.tmp[,1] in content
> mtrx
        sp1 sp2 sp3
hilltop  15  10   0
stream    4   0  95

If a data.table is an inappropriate/inefficient format in which to read in this data set please do include that in your answer.

Upvotes: 1

Views: 427

Answers (2)

user12728748
user12728748

Reputation: 8506

You can use dcast from data.table for that (and data.table is perfectly suited for this task):

library(data.table)
    
raw.input <- structure(list(site = c("hilltop", "hilltop", "hilltop", "hilltop", 
"hilltop", "stream", "stream", "stream"), sp = c("sp2", "sp1", 
"sp1", "sp1", "sp1", "sp3", "sp3", "sp1"), `x-sect area` = c(10L, 
3L, 5L, 4L, 3L, 45L, 50L, 4L)), row.names = c(NA, -8L), class = c("data.table", 
"data.frame"))

dcast(raw.input, site ~ sp, value.var="x-sect area", fun.aggregate = sum) |> 
  as.matrix(, rownames=1)
#>         sp1 sp2 sp3
#> hilltop  15  10   0
#> stream    4   0  95

Created on 2022-07-27 by the reprex package (v2.0.1)

Upvotes: 3

fre1990
fre1990

Reputation: 177

Partial answer - function aggregate() performs the required location- and species-level summing of cross-sectional area.

> aggregate(raw.input$`x-sect area`,list(raw.input$site,raw.input$sp),FUN=sum)
  Group.1 Group.2  x
1 hilltop     sp1 15
2  stream     sp1  4
3 hilltop     sp2 10
4  stream     sp3 95

Upvotes: 0

Related Questions