Reputation: 1453
I'm trying to add a column of the means to my data-frame with some code I can run over many different columns even if they include NA's (which I think excludes many possibilities).
The best I could do is:
TestData <- data.frame(geo=c(rep("AT",4),rep("DE",4)),time=c(rep(c(1990:1993),2)),value=c(NA,4,20,6,NA,NA,5,3))
mean <- aggregate(value~geo, TestData, mean)
which calculates the correct means by category (geo). How can I rightjoin them to the dataframe so that the mean is not just one observation but shows up at every time point? I'm thinking about ddply but can't get it to work properly. The data frame I'm looking for is:
geo time value mean (or optionally, no problem for me)
1 AT 1990 NA 10 NA
2 AT 1991 4 10 10
3 AT 1992 20 10 10
4 AT 1993 6 10 10
5 DE 1990 NA 4 NA
6 DE 1991 NA 4 NA
7 DE 1992 5 4 4
8 DE 1993 3 4 4
Any help would be appreciated!
Upvotes: 2
Views: 157
Reputation: 887148
Try:
testData1 <- within(TestData, {
Mean <- ave(value, geo, FUN=function(x) mean(x, na.rm=TRUE))
Mean[is.na(value)] <- NA}) #If you don't want `NA` values don't use this step
testData1
# geo time value Mean
#1 AT 1990 NA NA
#2 AT 1991 4 10
#3 AT 1992 20 10
#4 AT 1993 6 10
#5 DE 1990 NA NA
#6 DE 1991 NA NA
#7 DE 1992 5 4
#8 DE 1993 3 4
If you want to find mean
of multiple columns that starts
with name value
:
For example:
TestData1 <- TestData
TestData1$value2 <- c(4, NA, 25, NA, NA, 10,5, 2)
library(dplyr)
res <- left_join(TestData1,
TestData1 %>%
group_by(geo) %>%
mutate_each(funs(mean=mean(., na.rm=TRUE)), starts_with("value")),
by=c("geo", "time"))
colnames(res) <- gsub("\\.y$", ".mean", colnames(res))
res
# geo time value.x value2.x value.mean value2.mean
#1 AT 1990 NA 4 10 14.500000
#2 AT 1991 4 NA 10 14.500000
#3 AT 1992 20 25 10 14.500000
#4 AT 1993 6 NA 10 14.500000
#5 DE 1990 NA NA 4 5.666667
#6 DE 1991 NA 10 4 5.666667
#7 DE 1992 5 5 4 5.666667
#8 DE 1993 3 2 4 5.666667
Upvotes: 2