Reputation: 3

Taking the mean of a group of data that is dependent on multiple other columns in the same row in R

I want to take the mean of animal abundance every 4 quadrats. The station # and the areaContro # should match for averaged groups of quadrats

Fairly new to R

My attempt:

aaply(commData, station ~ areaContro & quadrat ~ station, .fun = mean, .expand = TRUE,.inform = TRUE, .drop = TRUE)

The error: Error in splitter_a(.data, .margins, .expand) :
'pairlist' object cannot be coerced to type 'integer'

structure(list(areaContro = c(29L, 29L, 29L, 29L, 29L, 29L, 29L, 
29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 
29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L), station = c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L), quadrat = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L), latitude = c(42.12521667, 
42.12658333, 42.12681667, 42.12705, 42.12466667, 42.12631667, 
42.12671667, 42.1272, 42.12671667, 42.12682833, 42.12726166, 
42.12794499, 42.12771667, 42.1285, 42.12871667, 42.12896667, 
42.12691667, 42.12748333, 42.12763333, 42.12785, 42.127, 42.12711818, 
42.12735152, 42.12755152, 42.1264341, 42.1265095, 42.12664427, 
42.12679211, 42.12703333, 42.12725), longitude = c(-67.33001667, 
-67.32823333, -67.3281, -67.3279, -67.31041667, -67.30906667, 
-67.30876667, -67.30843333, -67.29326667, -67.2942027, -67.29311937, 
-67.2929027, -67.27731667, -67.2768, -67.27655, -67.27628333, 
-67.25879572, -67.25684572, -67.25647905, -67.25616238, -67.2359, 
-67.23562265, -67.23512265, -67.23472265, -67.21841245, -67.21825004, 
-67.21814781, -67.21796007, -67.19853333, -67.19653333), scallops = c(1L, 
0L, 0L, 0L, 4L, 0L, 7L, 3L, 3L, 3L, 1L, 2L, 2L, 1L, 2L, 0L, 2L, 
2L, 2L, 2L, 45L, 11L, 4L, 8L, 12L, 9L, 11L, 11L, 4L, 10L), clappers = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L)), .Names = c("areaContro", 
"station", "quadrat", "latitude", "longitude", "scallops", "clappers"
), row.names = c(NA, 30L), class = "data.frame")

Upvotes: 0

Answers (2)

Majo

Reputation: 186

I think that what you're trying to do could be accomplished simply like so:

If you have:

commData <- structure(list(areaContro = c(29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L), station = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L), quadrat = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L), latitude = c(42.12521667, 42.12658333, 42.12681667, 42.12705, 42.12466667, 42.12631667, 42.12671667, 42.1272, 42.12671667, 42.12682833, 42.12726166, 42.12794499, 42.12771667, 42.1285, 42.12871667, 42.12896667, 42.12691667, 42.12748333, 42.12763333, 42.12785, 42.127, 42.12711818, 42.12735152, 42.12755152, 42.1264341, 42.1265095, 42.12664427, 42.12679211, 42.12703333, 42.12725), longitude = c(-67.33001667, -67.32823333, -67.3281, -67.3279, -67.31041667, -67.30906667, -67.30876667, -67.30843333, -67.29326667, -67.2942027, -67.29311937, -67.2929027, -67.27731667, -67.2768, -67.27655, -67.27628333, -67.25879572, -67.25684572, -67.25647905, -67.25616238, -67.2359, -67.23562265, -67.23512265, -67.23472265, -67.21841245, -67.21825004, -67.21814781, -67.21796007, -67.19853333, -67.19653333), scallops = c(1L, 0L, 0L, 0L, 4L, 0L, 7L, 3L, 3L, 3L, 1L, 2L, 2L, 1L, 2L, 0L, 2L, 2L, 2L, 2L, 45L, 11L, 4L, 8L, 12L, 9L, 11L, 11L, 4L, 10L), clappers = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L)), .Names = c("areaContro", "station", "quadrat", "latitude", "longitude", "scallops", "clappers" ), row.names = c(NA, 30L), class = "data.frame")

Check out ?aggregate:

For scallops and only dependent on quadrats - just to show you how the function works:

scallop <- aggregate(commData$scallops, by = list(commData$quadrat), FUN = mean)

For all the requested variables:

full_scallop <- aggregate(commData$scallops, by = list(commData$quadrat, commData$areaContro, commData$station), FUN = mean)

Everything all together could look something like this:

aggregate(cbind(commData$scallops, commData$clappers)~commData$quadrat+commData$areaContro+commData$station,  FUN = mean)

Upvotes: 0

rld2

Reputation: 31

If you are new to R I strongly recommend taking a look at the tidyverse in particular dplyr for common data manipulation tasks.

Your second argument of aaply is incorrect. According to the documentation it accepts a vector given the subscripts to split the data (e.g. 1 for rows). Also note that it accepts an array and results in an array.

I'm confused about what variable(s) you want to average over and what the average should be conditioned on. I think you want the average grouped by station and quadrat (and areaContro but this is constant)

Base R:

tapply(data$scallops, data[c("station", "quadrat")], mean)

dplyr:

data %>% group_by(station, quadrat) %>% 
summarise(scallops_mean = mean(scallops))

Upvotes: 1

Taking the mean of a group of data that is dependent on multiple other columns in the same row in R

Answers (2)

Related Questions