Luciano Rodriguez
Luciano Rodriguez

Reputation: 737

Averaging daily data over many years to replace NA values in R

I have a dataframe called daily which looks like this:

      daily[1:10,]
         Climate_Division    Date      Precipitation
      1                 1 1948-07-01     0.2100000
      2                 1 1948-07-02     0.7000000
      3                 1 1948-07-03     0.1900000
      4                 1 1948-07-04     0.1033333
      5                 1 1948-07-05     0.1982895
      6                 1 1948-07-06     0.1433333
      7                 1 1948-07-07            NA
      8                 1 1948-07-08            NA
      9                 1 1948-07-09            NA
      10                1 1948-07-10            NA

The objective that I would like to accomplish is average all the day values throughout the years (1948-1995) to replace the NA value that occurs on that particular day. For example, since row 7 has an NA for July 7, 1948, I would average all the July 7 from 1948-1995 and replace that particular day with the average.

What I have tried so far is this:

 index <- which(is.na(daily$Precipitation)) # find where the NA's occur
 daily_avg <- daily # copy dataframe
 daily_avg$Date <- strftime(daily_avg$Date, format="2000-%m-%d") # Change the Date format to represent only the day and month and disregard year
 daily_avg <- aggregate(Precipitation~Date, FUN = mean, data = daily_avg, na.rm = TRUE) # find the mean precip per day 
 daily[index,3] <- daily_avg[daily_avg$Date %in% strftime(daily[index,2], format="2000-%m-%d"), 2]

The last line in the code is not working properly, I'm not sure why yet. That is how my thought process of this problem is going. However, I was wondering if there is a better way of doing it using a built in function that I am not aware of. Any help is greatly appreciated. Thank you

Upvotes: 0

Views: 1290

Answers (3)

andrekos
andrekos

Reputation: 2892

A slightly neater version of mnel's answer, which I would prefer to the accepted one:

set.seed(1)
library(data.table)
# step 1: form data
daily <- seq(as.Date('1948-01-01'),as.Date('1995-12-31'),by="day")
dd <- data.table(date = daily, precip = runif(length(daily)))
# step 2: add NA values
nas <- sample(length(daily),300, FALSE)
dd[, precip := {is.na(precip) <- nas; precip}]
# step 3: replace NAs with day-of-month across years averages
dd[, c('month','day') := list(month(date), mday(date))]
dd[,precip:= ifelse(is.na(precip), mean(precip, na.rm=TRUE), precip), by=list(month,day)]

Upvotes: 1

agstudy
agstudy

Reputation: 121618

I think the data in your example, don't explain the problem. You should give data for a certain day over many years with some NA values. For example, here I change the problem for 2 days over 3 years.

Climate_Division       Date Precipitation
1                1 1948-07-01     0.2100000
2                1 1948-07-02            NA
3                1 1949-07-01     0.1900000
4                1 1949-07-02     0.1033333
5                1 1950-07-01            NA
6                1 1950-07-02     0.1433333

The idea if I understand , is to replace NA values by the mean of the values over all years. You can use ave and transform to create the new column containing the mean, then replace the NA value with it.

daily$daymonth <- strftime(daily$Date, format="%m-%d") 
daily <- transform(daily, mp =ave(Precipitation,daymonth,
               FUN=function(x) mean(x,na.rm=TRUE) ))
transform(daily, Precipitation =ifelse(is.na(Precipitation),mp,Precipitation))


    Climate_Division       Date Precipitation daymonth        mp
1                1 1948-07-01     0.2100000    07-01 0.2000000
2                1 1948-07-02     0.1233333    07-02 0.1233333
3                1 1949-07-01     0.1900000    07-01 0.2000000
4                1 1949-07-02     0.1033333    07-02 0.1233333
5                1 1950-07-01     0.2000000    07-01 0.2000000
6                1 1950-07-02     0.1433333    07-02 0.1233333

Upvotes: 3

mnel
mnel

Reputation: 115505

Using data.table

Some dummy data

 set.seed(1)
 library(data.table)
 daily <- seq(as.Date('1948-01-01'),as.Date('1995-12-31')
 dd <- data.table(date = daily, precip = runif(length(daily)))
 # add na values
 nas <- sample(length(daily),300, FALSE)
 dd[, precip := {is.na(precip) <- nas; precip}]


 ## calculate the daily averages
 # add day and month
 dd[, c('month','day') := list(month(date), mday(date))]

 monthdate <- dd[, list(mprecip = mean(precip, na.rm = TRUE)),
                  keyby = list(month, date)]
 # set key for joining
  setkey(dd, month, date)
 # replace NA with day-month averages
 dd[monthdate, precip := ifelse(is.na(precip), mprecip, precip)]
 # set key to reorder to daily

 setkey(dd, date)

Upvotes: 2

Related Questions