RBeginner
RBeginner

Reputation: 51

Replace zero values with mean and then finding the daily sum based on hourly data

the following data is a subset of a larger dataset(power_use):

         ï..time     watt_hour         
1    18-01-22 0:00    0.0261
2    18-01-22 1:00    0.0304
3    18-01-22 2:00    0.0275
4    18-01-22 3:00    0.0275
5    18-01-22 4:00    6.4600
6    18-01-22 5:00    0.0275
7    18-01-22 6:00    3.9500
8    18-01-22 7:00    0.0259
9    18-01-22 8:00    0.2470
10   18-01-22 9:00    1.1200 

I am trying to replace all zero and NA values by imputing(replacing with the mean while omitting zeros and NA). I have the below code:

dates <- power_use$ï..time
avg = power_use$watt_hour

newdata = power_use %>%
  mutate(dates=as.POSIXct(dates,format="%d-%m-%y %H:%M",tz="")) %>%
  group_by(date(dates)) %>%
  summarise(mean=mean(avg,na.rm=TRUE),max=max(avg,na.rm=TRUE),
            min=min(avg,na.rm=TRUE))

The result i am getting is that all new daily values are equal to the mean. I have written this exact same code on another set of data where the only difference is that there are no zero values but only NA and the data is taken every 30 minutes (it worked perfectly well).

After replacing zeros and NA with the mean I want to take the daily sum. Some days have 2 data points while others have more.

Any help?

Upvotes: 0

Views: 904

Answers (2)

RBeginner
RBeginner

Reputation: 51

I managed to solve my problem by doing the following:

ac13 = as.numeric(power_use$watt_hour)
ac13[ac13==0] = NA
avg_ac13 = mean(na.omit(ac13))

power_use_replaced <- power_use %>%
 mutate(ac13_mean_replaced  = ifelse(is.na(ac13), avg_ac13, ac13))
 sum(is.na(power_use_replaced$watt_hour))
 sum(is.na(power_use_replaced$ac13_mean_replaced))
 head(power_use_replaced)

Upvotes: 0

LGe
LGe

Reputation: 516

If you replace all zeros with NA in a first step, you could use na_mean from the imputeTS package:

library(imputeTS)
library(tidyverse)
library(lubridate)

data <- tibble(date = c(ymd("2013-01-01")+ months(0:17)),
               value = c(NA,0,0,0,0,0,0,7,NA,0,0,0,11,23,54,33,45,0))

data[data == 0] <- NA
na_mean(data, option = "mean", maxgap = Inf)

Upvotes: 1

Related Questions