Reputation: 267
Referring to the question "Calculating average of based on condition", I need to calculate average
of the column E
based on the column F
Below is my part of data frame df
but my actual data is 65K values.
E F
3.130658445 -1
4.175605237 -1
4.949554963 0
4.653496112 0
4.382672845 0
3.870951272 0
3.905365677 0
3.795199341 0
3.374740696 0
3.104690415 0
2.801178871 0
2.487881321 0
2.449349554 0
2.405409636 0
2.090901539 0
1.632416356 0
1.700583696 0
1.846504012 0
1.949797831 0
1.963114449 0
2.033100326 0
2.014312751 0
1.997178247 0
2.143775497 0
Based on the solution provided in the mentioned post, below is my script.
setDT(df)[, Avg := c(rep(mean(head(d$fE, 5)), 5), rep(0, .N-5)),
cumsum(c(TRUE, diff(abs(F)!=1)==1))]
But when executed I am getting the below error.
Error in rep(0, .N - 5) : invalid 'times' argument
Upvotes: 2
Views: 6302
Reputation: 909
Try this : dt<-data.table(df)
dt[,Avg:=mean(E),by="F"]
dt <- unique(dt,by="F")
this is the result:
`E F Avg
1: 3.130658 -1 3.653132
2: 4.949555 0 2.797826
Doing only this : dt<-data.table(df)
dt[,Avg:=mean(E),by="F"]
You get: E F Avg
1: 3.130658 -1 3.653132
2: 4.175605 -1 3.653132
3: 4.949555 0 2.797826
4: 4.653496 0 2.797826
5: 4.382673 0 2.797826
6: 3.870951 0 2.797826
7: 3.905366 0 2.797826
8: 3.795199 0 2.797826
9: 3.374741 0 2.797826
10: 3.104690 0 2.797826
11: 2.801179 0 2.797826
12: 2.487881 0 2.797826
13: 2.449350 0 2.797826
14: 2.405410 0 2.797826
15: 2.090902 0 2.797826
16: 1.632416 0 2.797826
17: 1.700584 0 2.797826
18: 1.846504 0 2.797826
19: 1.949798 0 2.797826
20: 1.963114 0 2.797826
21: 2.033100 0 2.797826
22: 2.014313 0 2.797826
23: 1.997178 0 2.797826
24: 2.143775 0 2.797826
Upvotes: 0
Reputation: 1482
use aggregate:
agg <- aggregate(df$E,by=list(df$F), FUN=mean)
you used a data table example, but you said data frame in your qu data table:
# this will retain all rows and return mean as a new column (per group_
df[, Mean:=mean(E), by=list(F)]
# this will return means per group only
df[, mean(E),by=.(F)]
Upvotes: 1