Reputation: 475
I'm trying to achieve a simple task of creating a subset of my dateframe (df) by calculating the mean from a variable with repeated measurement (measured multiple times a day, over several weeks). This variable is called "consumption" in my df
I followed this example here, and adapted the code to my df and my desired conditions: Calculate mean of column data based on conditions in another column
However, I went and calculated a few of the means by hand (using excel), and just get completely different results
Could someone point me in the right direction of where my code is going wrong?
I do have "0" as a few measurements, and they are important, and need to me included when calculating mean.
Here is a reproducible example:
df <- read.table("https://pastebin.com/raw/Zpa8cLBN", header = T)
library(dplyr)
df_mean <- df %>% group_by(treatment,day,Control) %>% summarise(
consumption = first(consumption), consumption = last(consumption), consumption = mean(consumption[consumption >= 0]))
desired_results <- read.table("https://pastebin.com/raw/vZten0jd", header = T) # calculated manually in excel
When I compare the two, the results in the column "consumption", which should be the calculated mean, are not correct at all.
Thanks everyone
Upvotes: 1
Views: 52
Reputation: 886938
We can use data.table
library(data.table)
setDT(df)[, .(Mean_consumption = first(consumption), Mean_consumptionlast = last(consumption), Mean_consumptionfilt = mean(consumption[consumption >= 0])), .(treatment, day, Control)]
Upvotes: 1
Reputation: 475
It appears that I need to use variables names for the summerise
function that are different than the original df
df_mean <- df %>% group_by(treatment,day,Control) %>% summarise(
Mean_consumption = first(consumption), Mean_consumption = last(consumption), Mean_consumption = mean(consumption[consumption >= 0]))
When cross referenced with my desired_results
, it's what I was looking for.
Thanks @jlesuffleur
Upvotes: 1