Reputation: 1814
My data.frame
df <- data.frame(ID=rep(1:3, 3), Obs_1=rnorm(9), Obs_2=rnorm(9), Obs_3=rnorm(9))
I want to calculate the mean of each colum by the ID. I have tried two approaches: 1)
ag <- aggregate(. ~ ID, df, function(x) c(mean = mean(x)))
returns
ID Obs_1 Obs_2 Obs_3
1 1 0.41220831 -0.9999704 -0.7234958
2 2 0.03564336 0.5014259 0.4847635
3 3 0.05647885 0.2067311 -0.0542368
Why does it does not indicate "mean" in the column name and how can make this reported? something like this:
Obs_1.mean Obs_2.mean Obs_3.mean
2) the second approach
df[, c(mean = lapply(.SD, mean), sd = lapply(.SD, sd)), by = ID]
it gives
unused argument (by = ID)
Thank you
Upvotes: 0
Views: 66
Reputation: 3201
Here is one approach:
df <- data.frame(ID=rep(1:3, 3),
Obs_1=rnorm(9),
Obs_2=rnorm(9),
Obs_3=rnorm(9))
ag <- aggregate(. ~ ID, df, FUN = mean)
colnames(ag) <- sapply(colnames(ag),
FUN = function(x) { paste0(x, ifelse(x=="ID", "", ".mean")) })
It yields:
ID Obs_1.mean Obs_2.mean Obs_3.mean
1 1 0.2675131 -0.3494175 -0.3128264
2 2 1.0988356 -0.6645116 -0.2192402
3 3 -1.1569562 -0.4040192 -0.1265475
About your question why the trick
FUN = function(x) { c(mean = mean(x)) })
to stick the name "mean" to the result of FUN does not work:
I suppose that is due to the grouping that aggregate() performs. Each group might in theory return a different name. Which of these potentially conflicting names should aggregate() use then to name the complete column of the aggregated dataframe...?
If no grouping happens, like here,
sapply(df, FUN = function(x) { c(mean = mean(x)) })
we get the expected names:
ID.mean Obs_1.mean Obs_2.mean Obs_3.mean
2.0000000 0.4265256 -0.2046452 -0.4496054
Upvotes: 2