Reputation: 67
i have a data frame as follows, I want to create a matrix, where the average sleep duration hour (SLP) is shown according to 3 recruiting site(site) and 5 recruiting year(year).
SLP site year
8.6 1 2008
7.2 1 2005
6.4 2 2006
9.5 3 2007
6.1 2 2009
3.6 1 2005
8.6 1 2008
7.2 1 2005
6.4 2 2006
9.5 3 2007
6.1 2 2009
5.1 3 2008
2.1 2 2006
My desired output is:
1 2 3
2005 6.00 - -
2006 - 4.97 -
2007 - - 9.5
2008 8.60 - 5.1
2009 - 6.10 -
Column name is variable of site, row name is variable of year and values in each cell is average of SLP. How do I do this?
Upvotes: 3
Views: 9338
Reputation: 269481
Here are a few different solutions which use no packages:
1) tapply This uses no packages. It produces a "matrix"
output with NA values for the empty cells:
tapply(DF$SLP, DF[c("year", "site")], mean)
giving:
site
year 1 2 3
2005 6.0 NA NA
2006 NA 4.966667 NA
2007 NA NA 9.5
2008 8.6 NA 5.1
2009 NA 6.100000 NA
2) aggregate/xtabs This uses aggregate
+ xtabs
. This creates an object of class c("xtabs", "table")
with zero values for empty cells:
fo <- SLP ~ year + site
xtabs(fo, aggregate(fo, DF, mean))
giving;
site
year 1 2 3
2005 6.000000 0.000000 0.000000
2006 0.000000 4.966667 0.000000
2007 0.000000 0.000000 9.500000
2008 8.600000 0.000000 5.100000
2009 0.000000 6.100000 0.000000
3) aggregate/reshape This also uses aggregate
but uses reshape
rather than xtabs
. It gives a data frame r
with NA's for empty cells. The last line makes the column names consistent with the prior solutions and could be omitted if this were not important.
ag <- aggregate(SLP ~ site + year, DF, mean)
r <- reshape(ag, dir = "wide", idvar = "year", timevar = "site")
names(r) <- sub(".*[.]", "", names(r))
giving:
> r
year 1 2 3
1 2005 6.0 NA NA
3 2006 NA 4.966667 NA
5 2007 NA NA 9.5
2 2008 8.6 NA 5.1
4 2009 NA 6.100000 NA
Note: The input DF
in reproducible form used is:
DF <- structure(list(SLP = c(8.6, 7.2, 6.4, 9.5, 6.1, 3.6, 8.6, 7.2,
6.4, 9.5, 6.1, 5.1, 2.1), site = c(1L, 1L, 2L, 3L, 2L, 1L, 1L,
1L, 2L, 3L, 2L, 3L, 2L), year = c(2008L, 2005L, 2006L, 2007L,
2009L, 2005L, 2008L, 2005L, 2006L, 2007L, 2009L, 2008L, 2006L
)), .Names = c("SLP", "site", "year"), class = "data.frame", row.names = c(NA,
-13L))
Upvotes: 4
Reputation: 444
another solution
library(tidyr)
library(dplyr)
df%>%
group_by(year, site) %>%
summarise(m=mean(SLP)) %>%
spread(site, m )%>%
as.matrix()
Upvotes: 2
Reputation: 38500
Building on @g-grothendieck's use of xtabs, we can combine this with table
and ifelse
to return the same result.
# get a count of the number of observations per matrix cell (filling 0s with 1)
tempTab <- ifelse(with(df, table(year, + site)) == 0, 1, with(df, table(year, + site)))
tempTab
year 1 2 3
2005 3 1 1
2006 1 3 1
2007 1 1 2
2008 2 1 1
2009 1 2 1
Now use xtabs
, which returns a sum of the values when multiple observations are in a cell and divide by tempTab to get the mean.
xtabs(SLP ~ year + site, df) / tempTab
site
year 1 2 3
2005 6.000000 0.000000 0.000000
2006 0.000000 4.966667 0.000000
2007 0.000000 0.000000 9.500000
2008 8.600000 0.000000 5.100000
2009 0.000000 6.100000 0.000000
Upvotes: 0
Reputation: 887038
We can use acast
library(reshape2)
acast(df1, year~site, value.var="SLP", mean)
Or using tapply
from base R
with(df1, tapply(SLP, list(year, site), FUN = mean))
Upvotes: 4