Reputation: 153
So I am an R code beginner. It seems to me that there is a quick and dirty way to calculate the mean of a set of n rows within a column, but is there something similar for standard deviation (or standard error)? I'd like to avoid looping if possible because this is only a small part of the increasingly unwieldy (for a beginner) code I am building. Here is a simplified example of the dataset I will be working with:
Canopy Species Date Pa
1 Maple BETH 4/26/2014 -0.1162607263
2 Maple BETH 4/26/2014 -0.2742194706
3 Maple BETH 4/26/2014 -0.1864006372
4 Maple BETH 4/26/2014 -0.0739905518
5 Maple BETH 4/26/2014 -0.0751169983
6 Maple BETH 4/26/2014 -0.0782771938
7 Maple BETH 4/26/2014 -0.1671646757
8 Maple BETH 4/26/2014 -0.2464696338
9 Maple BETH 4/26/2014 -0.2176720386
10 Maple BETH 4/26/2014 -0.2283216397
11 Maple BETH 4/26/2014 -0.1152989165
12 Maple BETH 4/26/2014 -0.2720884764
13 Maple BETH 4/26/2014 -0.1849383730
14 Maple BETH 4/26/2014 -0.0734205199
15 Maple BETH 4/26/2014 -0.0745294634
16 Maple BETH 4/26/2014 -0.0776640601
17 Maple BETH 4/26/2014 -0.1658603785
18 Maple BETH 4/26/2014 -0.2445047320
19 Maple BETH 4/26/2014 -0.2159337593
20 Maple BETH 4/26/2014 -0.2264833266
and here is an example piece of code I was referring to for means. This one finds the mean for every 10 rows in the Pa column:
mu<-colMeans(matrix(Table$Pa, nrow=10))
Thank you in advance for your help and please let me know if there is any more information I should provide.
Upvotes: 0
Views: 1693
Reputation: 15163
You can also do this with base R using by
:
> n<-nrow(Table)
> index<-ceiling((1:n)/10)
> by(Table$Pa,index,mean)
index: 1
[1] -0.1663894
------------------------------------------------------------
index: 2
[1] -0.1650722
> by(Table$Pa,index,sd)
index: 1
[1] 0.07604938
------------------------------------------------------------
index: 2
[1] 0.07544763
Edit: you can put these in a table, for example, like this:
>cbind(index=unique(index),mean=by(Table$Pa,index,mean),sd=by(Table$Pa,index,sd))
index mean sd
1 1 -0.1663894 0.07604938
2 2 -0.1650722 0.07544763
Upvotes: 1
Reputation: 1
What @rawr is saying using the dplyr-package:
df %>%
mutate(id=round(row_number()/10)) %>%
group_by(id) %>%
summarize(mean=mean(Pa),sd=sd(Pa))
id mean sd
(dbl) (dbl) (dbl)
1 0 52.00000 67.97058
2 1 32.22222 18.55921
3 2 44.54545 36.70521
4 3 23.33333 25.49510
5 4 24.54545 18.63525
6 5 58.88889 78.96905
7 6 52.72727 89.89893
8 7 31.11111 26.19372
9 8 24.54545 18.09068
10 9 50.00000 64.42049
Upvotes: 0
Reputation: 7190
Here is a mixed base R/dplyr solution: First I created a column named fac_to_spli which is the factor to use to calculate the standard deviations and then with group_by and mutate of dplyr I did the calculations.
library(dplyr)
df$fac_to_spli <- sort(rep(seq(from = 1, to = nrow(df), by = 10), nrow(df) / 2 ))
df %>% group_by(fac_to_spli) %>% mutate(stand_dev = sd(Pa))
Source: local data frame [20 x 6]
Groups: fac_to_spli [2]
Canopy Species Date Pa fac_to_spli stand_dev
(fctr) (fctr) (fctr) (dbl) (dbl) (dbl)
1 Maple BETH 4/26/2014 -0.11626073 1 0.07604938
2 Maple BETH 4/26/2014 -0.27421947 1 0.07604938
3 Maple BETH 4/26/2014 -0.18640064 1 0.07604938
4 Maple BETH 4/26/2014 -0.07399055 1 0.07604938
5 Maple BETH 4/26/2014 -0.07511700 1 0.07604938
6 Maple BETH 4/26/2014 -0.07827719 1 0.07604938
7 Maple BETH 4/26/2014 -0.16716468 1 0.07604938
8 Maple BETH 4/26/2014 -0.24646963 1 0.07604938
9 Maple BETH 4/26/2014 -0.21767204 1 0.07604938
10 Maple BETH 4/26/2014 -0.22832164 1 0.07604938
11 Maple BETH 4/26/2014 -0.11529892 11 0.07544763
12 Maple BETH 4/26/2014 -0.27208848 11 0.07544763
13 Maple BETH 4/26/2014 -0.18493837 11 0.07544763
14 Maple BETH 4/26/2014 -0.07342052 11 0.07544763
15 Maple BETH 4/26/2014 -0.07452946 11 0.07544763
16 Maple BETH 4/26/2014 -0.07766406 11 0.07544763
17 Maple BETH 4/26/2014 -0.16586038 11 0.07544763
18 Maple BETH 4/26/2014 -0.24450473 11 0.07544763
19 Maple BETH 4/26/2014 -0.21593376 11 0.07544763
20 Maple BETH 4/26/2014 -0.22648333 11 0.07544763
Upvotes: 0