subsetting dataframe results in incorrect output

Question

I'm trying to achieve a simple task of creating a subset of my dateframe (df) by calculating the mean from a variable with repeated measurement (measured multiple times a day, over several weeks). This variable is called "consumption" in my df

I followed this example here, and adapted the code to my df and my desired conditions: Calculate mean of column data based on conditions in another column

However, I went and calculated a few of the means by hand (using excel), and just get completely different results

Could someone point me in the right direction of where my code is going wrong?

I do have "0" as a few measurements, and they are important, and need to me included when calculating mean.

Here is a reproducible example:

df <- read.table("https://pastebin.com/raw/Zpa8cLBN", header = T)

library(dplyr)

df_mean <- df %>% group_by(treatment,day,Control) %>% summarise(
  consumption = first(consumption), consumption = last(consumption), consumption = mean(consumption[consumption >= 0]))

desired_results <- read.table("https://pastebin.com/raw/vZten0jd", header = T) # calculated manually in excel

When I compare the two, the results in the column "consumption", which should be the calculated mean, are not correct at all.

Thanks everyone

Andy · Accepted Answer

It appears that I need to use variables names for the summerisefunction that are different than the original df

df_mean <- df %>% group_by(treatment,day,Control) %>% summarise(
  Mean_consumption = first(consumption), Mean_consumption = last(consumption), Mean_consumption = mean(consumption[consumption >= 0]))

When cross referenced with my desired_results, it's what I was looking for.

Thanks @jlesuffleur

subsetting dataframe results in incorrect output

Answers (2)

Related Questions