Reputation: 51
the following data is a subset of a larger dataset(power_use
):
ï..time watt_hour
1 18-01-22 0:00 0.0261
2 18-01-22 1:00 0.0304
3 18-01-22 2:00 0.0275
4 18-01-22 3:00 0.0275
5 18-01-22 4:00 6.4600
6 18-01-22 5:00 0.0275
7 18-01-22 6:00 3.9500
8 18-01-22 7:00 0.0259
9 18-01-22 8:00 0.2470
10 18-01-22 9:00 1.1200
I am trying to replace all zero and NA values by imputing(replacing with the mean while omitting zeros and NA). I have the below code:
dates <- power_use$ï..time
avg = power_use$watt_hour
newdata = power_use %>%
mutate(dates=as.POSIXct(dates,format="%d-%m-%y %H:%M",tz="")) %>%
group_by(date(dates)) %>%
summarise(mean=mean(avg,na.rm=TRUE),max=max(avg,na.rm=TRUE),
min=min(avg,na.rm=TRUE))
The result i am getting is that all new daily values are equal to the mean. I have written this exact same code on another set of data where the only difference is that there are no zero values but only NA and the data is taken every 30 minutes (it worked perfectly well).
After replacing zeros and NA with the mean I want to take the daily sum. Some days have 2 data points while others have more.
Any help?
Upvotes: 0
Views: 904
Reputation: 51
I managed to solve my problem by doing the following:
ac13 = as.numeric(power_use$watt_hour)
ac13[ac13==0] = NA
avg_ac13 = mean(na.omit(ac13))
power_use_replaced <- power_use %>%
mutate(ac13_mean_replaced = ifelse(is.na(ac13), avg_ac13, ac13))
sum(is.na(power_use_replaced$watt_hour))
sum(is.na(power_use_replaced$ac13_mean_replaced))
head(power_use_replaced)
Upvotes: 0
Reputation: 516
If you replace all zeros with NA in a first step, you could use na_mean from the imputeTS package:
library(imputeTS)
library(tidyverse)
library(lubridate)
data <- tibble(date = c(ymd("2013-01-01")+ months(0:17)),
value = c(NA,0,0,0,0,0,0,7,NA,0,0,0,11,23,54,33,45,0))
data[data == 0] <- NA
na_mean(data, option = "mean", maxgap = Inf)
Upvotes: 1