beaglebagel
beaglebagel

Reputation: 23

Fill missing values in the data.frame with the data from the same data frame

I'm trying to backfill a fully outerjoined table with nearest preceding column data.

The data frame I have looks like.. (No rows have both sides as NA and the table is sorted by date).

              date     X         Y
2012-07-05 00:01:19   0.0122     NA
2012-07-05 03:19:34   0.0121     NA
2012-07-05 03:19:56   0.0121   0.027
2012-07-05 03:20:31   0.0121     NA
2012-07-05 04:19:56   0.0121   0.028
2012-07-05 04:20:31   0.0121     NA
2012-07-05 04:20:50   0.0121     NA
2012-07-05 04:22:29   0.0121   0.027
2012-07-05 04:24:37   0.0121     NA
2012-07-05 20:48:45   0.0121     NA
2012-07-05 23:02:34    NA      0.029
2012-07-05 23:30:45    NA      0.029

with this, I'm looking to..

  1. leave the non-data missing rows as it is.
  2. If either one side is missing (NA), then fill it with the "nearest preceding" row which has valid opposite side's value.

And so as the result, I would like to have the table looking like...

              date     X         Y
2012-07-05 00:01:19   0.0122     NA
2012-07-05 03:19:34   0.0121     NA
2012-07-05 03:19:56   0.0121   0.027
2012-07-05 03:20:31   0.0121   0.027
2012-07-05 04:19:56   0.0121   0.028
2012-07-05 04:20:31   0.0121   0.028
2012-07-05 04:20:50   0.0121   0.028
2012-07-05 04:22:29   0.0121   0.027
2012-07-05 04:24:37   0.0121   0.027
2012-07-05 20:48:45   0.0121   0.027
2012-07-05 23:02:34   0.0121   0.029
2012-07-05 23:30:45   0.0121   0.029

What kind of R commands can I use to achieve this?

Upvotes: 2

Views: 2052

Answers (2)

Jilber Urbina
Jilber Urbina

Reputation: 61154

Maybe you should try this

> library(zoo)
> na.locf(Data)
                x     y
2012-07-17 0.0122  <NA>
2012-07-18 0.0121  <NA>
2012-07-19 0.0121 0.027
2012-07-20 0.0121 0.027
2012-07-21 0.0121 0.028
2012-07-22 0.0121 0.028
2012-07-23 0.0121 0.028
2012-07-24 0.0121 0.027
2012-07-25 0.0121 0.027
2012-07-26 0.0121 0.027
2012-07-27 0.0121 0.029
2012-07-28 0.0121 0.029

where Data is your data.frame

Upvotes: 2

GSee
GSee

Reputation: 49810

Use na.locf from the zoo package

dat <- read.table(text="2012-07-05 00:01:19   0.0122     NA
2012-07-05 03:19:34   0.0121     NA
2012-07-05 03:19:56   0.0121   0.027
2012-07-05 03:20:31   0.0121     NA
2012-07-05 04:19:56   0.0121   0.028
2012-07-05 04:20:31   0.0121     NA
2012-07-05 04:20:50   0.0121     NA
2012-07-05 04:22:29   0.0121   0.027
2012-07-05 04:24:37   0.0121     NA
2012-07-05 20:48:45   0.0121     NA
2012-07-05 23:02:34    NA      0.029
2012-07-05 23:30:45    NA      0.029")

require("zoo")
na.locf(dat)
#           V1       V2     V3    V4
#1  2012-07-05 00:01:19 0.0122  <NA>
#2  2012-07-05 03:19:34 0.0121  <NA>
#3  2012-07-05 03:19:56 0.0121 0.027
#4  2012-07-05 03:20:31 0.0121 0.027
#5  2012-07-05 04:19:56 0.0121 0.028
#6  2012-07-05 04:20:31 0.0121 0.028
#7  2012-07-05 04:20:50 0.0121 0.028
#8  2012-07-05 04:22:29 0.0121 0.027
#9  2012-07-05 04:24:37 0.0121 0.027
#10 2012-07-05 20:48:45 0.0121 0.027
#11 2012-07-05 23:02:34 0.0121 0.029
#12 2012-07-05 23:30:45 0.0121 0.029

Upvotes: 5

Related Questions