Reputation: 1

Calculating repeated means from columns using R

This is hopefully a simple question about loops in R. I have a dataset that is made up of results from a simulation. Each column is the results from a single cow, taken each day for a month, then repeated 100 times. So the total length of the column is 3000. I would like to calculate the mean of the simulated results for each day, to get a single value for each day, for each cow. So I need to calculate the mean of the first entry, the 31st entry, the 61st entry and so on, and then the mean of the second entry, the 32nd entry, the 62nd entry and so on. I would like to end up with a 30 entry column for each cow. I have been trying to do it using a loop in R but can't work out how. Any advice would be greatly appreciated.

Here is some example data:

a<-seq(from = 1, by = 1, length = 30)
b<-seq(from = 1, by = 0.5, length = 30)
c<-seq(from = 1, by = 2, length = 30)

cow1<-rep(a,100)
cow2<-rep(b,100)
cow3<-rep(c,100)

dat<-as.data.frame(cbind(cow1,cow2,cow3))

Upvotes: 0

Answers (2)

user17309340

Reputation:

I think it is better to construct a column "day" and then use it with tapply, as Xi'an said, there is no need for a loop and a loop would be slower and less clean. In code this gives us :

a <- seq(from = 1, by = 1, length = 30)
b <- seq(from = 1, by = 0.5, length = 30)
c <- seq(from = 1, by = 2, length = 30)

day <- seq(from = 1, by = 1, length = 30)
day <- rep(day,100)

cow1 <- rep(a,100)
cow2 <- rep(b,100)
cow3 <- rep(c,100)

# Construct a data frame, I find this cay is better as it gives names to the columns.
dat <- data.frame(day,cow1,cow2,cow3)

# Here are the results
tapply(dat$cow1, dat$day, mean)
tapply(dat$cow2, dat$day, mean)
tapply(dat$cow3, dat$day, mean)

Upvotes: 1

Tim-TU

Reputation: 408

I agree with TMat, including a column with day is useful.

Here is my working example using tidyverse

library(tidyverse)

a <- seq(from = 1, by = 1, length = 30)
b <- seq(from = 1, by = 0.5, length = 30)
c <- seq(from = 1, by = 2, length = 30)

day <- seq(from = 1, by = 1, length = 30)
day <- rep(day,100)

cow1 <- rep(a,100)
cow2 <- rep(b,100)
cow3 <- rep(c,100)

dat <- data.frame(day,cow1,cow2,cow3) %>% 
  pivot_longer(cols = 2:4) %>% 
  group_by(day, name) %>% 
  summarize(mean = mean(value))
#> `summarise()` regrouping output by 'day' (override with `.groups` argument)
dat
#> # A tibble: 90 x 3
#> # Groups:   day [30]
#>      day name   mean
#>    <dbl> <chr> <dbl>
#>  1     1 cow1    1  
#>  2     1 cow2    1  
#>  3     1 cow3    1  
#>  4     2 cow1    2  
#>  5     2 cow2    1.5
#>  6     2 cow3    3  
#>  7     3 cow1    3  
#>  8     3 cow2    2  
#>  9     3 cow3    5  
#> 10     4 cow1    4  
#> # ... with 80 more rows

ggplot(dat, aes(x = day, y = mean, fill = name)) + 
  geom_col(position = "dodge")

^{Created on 2020-07-08 by the reprex package (v0.3.0)}

Upvotes: 0

Calculating repeated means from columns using R

Answers (2)

Related Questions