Reputation: 737
I have a dataframe called daily which looks like this:
daily[1:10,]
Climate_Division Date Precipitation
1 1 1948-07-01 0.2100000
2 1 1948-07-02 0.7000000
3 1 1948-07-03 0.1900000
4 1 1948-07-04 0.1033333
5 1 1948-07-05 0.1982895
6 1 1948-07-06 0.1433333
7 1 1948-07-07 NA
8 1 1948-07-08 NA
9 1 1948-07-09 NA
10 1 1948-07-10 NA
The objective that I would like to accomplish is average all the day values throughout the years (1948-1995) to replace the NA value that occurs on that particular day. For example, since row 7 has an NA for July 7, 1948, I would average all the July 7 from 1948-1995 and replace that particular day with the average.
What I have tried so far is this:
index <- which(is.na(daily$Precipitation)) # find where the NA's occur
daily_avg <- daily # copy dataframe
daily_avg$Date <- strftime(daily_avg$Date, format="2000-%m-%d") # Change the Date format to represent only the day and month and disregard year
daily_avg <- aggregate(Precipitation~Date, FUN = mean, data = daily_avg, na.rm = TRUE) # find the mean precip per day
daily[index,3] <- daily_avg[daily_avg$Date %in% strftime(daily[index,2], format="2000-%m-%d"), 2]
The last line in the code is not working properly, I'm not sure why yet. That is how my thought process of this problem is going. However, I was wondering if there is a better way of doing it using a built in function that I am not aware of. Any help is greatly appreciated. Thank you
Upvotes: 0
Views: 1290
Reputation: 2892
A slightly neater version of mnel's answer, which I would prefer to the accepted one:
set.seed(1)
library(data.table)
# step 1: form data
daily <- seq(as.Date('1948-01-01'),as.Date('1995-12-31'),by="day")
dd <- data.table(date = daily, precip = runif(length(daily)))
# step 2: add NA values
nas <- sample(length(daily),300, FALSE)
dd[, precip := {is.na(precip) <- nas; precip}]
# step 3: replace NAs with day-of-month across years averages
dd[, c('month','day') := list(month(date), mday(date))]
dd[,precip:= ifelse(is.na(precip), mean(precip, na.rm=TRUE), precip), by=list(month,day)]
Upvotes: 1
Reputation: 121618
I think the data in your example, don't explain the problem. You should give data for a certain day over many years with some NA values. For example, here I change the problem for 2 days over 3 years.
Climate_Division Date Precipitation
1 1 1948-07-01 0.2100000
2 1 1948-07-02 NA
3 1 1949-07-01 0.1900000
4 1 1949-07-02 0.1033333
5 1 1950-07-01 NA
6 1 1950-07-02 0.1433333
The idea if I understand , is to replace NA values by the mean of the values over all years. You can use ave
and transform
to create the new column containing the mean, then replace the NA value with it.
daily$daymonth <- strftime(daily$Date, format="%m-%d")
daily <- transform(daily, mp =ave(Precipitation,daymonth,
FUN=function(x) mean(x,na.rm=TRUE) ))
transform(daily, Precipitation =ifelse(is.na(Precipitation),mp,Precipitation))
Climate_Division Date Precipitation daymonth mp
1 1 1948-07-01 0.2100000 07-01 0.2000000
2 1 1948-07-02 0.1233333 07-02 0.1233333
3 1 1949-07-01 0.1900000 07-01 0.2000000
4 1 1949-07-02 0.1033333 07-02 0.1233333
5 1 1950-07-01 0.2000000 07-01 0.2000000
6 1 1950-07-02 0.1433333 07-02 0.1233333
Upvotes: 3
Reputation: 115505
Using data.table
Some dummy data
set.seed(1)
library(data.table)
daily <- seq(as.Date('1948-01-01'),as.Date('1995-12-31')
dd <- data.table(date = daily, precip = runif(length(daily)))
# add na values
nas <- sample(length(daily),300, FALSE)
dd[, precip := {is.na(precip) <- nas; precip}]
## calculate the daily averages
# add day and month
dd[, c('month','day') := list(month(date), mday(date))]
monthdate <- dd[, list(mprecip = mean(precip, na.rm = TRUE)),
keyby = list(month, date)]
# set key for joining
setkey(dd, month, date)
# replace NA with day-month averages
dd[monthdate, precip := ifelse(is.na(precip), mprecip, precip)]
# set key to reorder to daily
setkey(dd, date)
Upvotes: 2