Hydro
Hydro

Reputation: 1117

fill in missing values with continous record in R?

I am trying to have a continuous record with NA for any missing day. In my code below I created some Date with Fake Data. I intentionally removed some days just to reflect my case. I want a continuous record and use NA for Day/Date where i do not have value. For example, myData in my code should have the 3rd day of every month with value as NA. This is just an example, i have many dates/days missing in my actual data but want to have a continous date with value set as NA for those missing dates

library(lubridate)
library(tidyverse)

set.seed(1500)

DF <-  data.frame(Date = seq(as.Date("2000-01-01"), to = as.Date("2010-12-31"), by = "days"), FakeData = runif(4018, 0,50))


myData <- DF %>% 
  mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>% 
  filter(!Day == 03)

Upvotes: 1

Views: 61

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389235

You can merge myData with DF to get all the days with NA appended

merge(DF[-2], myData, all.x = TRUE)

#       Date FakeData Year Month Day
#1 2000-01-01 31.19773 2000     1   1
#2 2000-01-02 48.61096 2000     1   2
#3 2000-01-03       NA   NA    NA  NA
#4 2000-01-04 17.11499 2000     1   4
#5 2000-01-05 15.52981 2000     1   5
#6 2000-01-06 12.92870 2000     1   6
#....

In case you don't have access to DF you can generate one by :

DF <- data.frame(Date = seq(min(myData$Date), max(myData$Date), by = 'day'))

Upvotes: 2

RyanFrost
RyanFrost

Reputation: 1428

You can use tidyr's complete for this, which fills in any rows that are missing from a specified vector (in this case a sequence containing all dates between the first & last dates in your dataframe)

myData %>% 
  complete(Date = seq(first(Date), last(Date), by = "day"))

#> # A tibble: 4,018 x 5
#>    Date       FakeData  Year Month   Day
#>    <date>        <dbl> <dbl> <dbl> <int>
#>  1 2000-01-01     31.2  2000     1     1
#>  2 2000-01-02     48.6  2000     1     2
#>  3 2000-01-03     NA      NA    NA    NA
#>  4 2000-01-04     17.1  2000     1     4
#>  5 2000-01-05     15.5  2000     1     5
#>  6 2000-01-06     12.9  2000     1     6
#>  7 2000-01-07     19.8  2000     1     7
#>  8 2000-01-08     48.1  2000     1     8
#>  9 2000-01-09     47.5  2000     1     9
#> 10 2000-01-10     40.7  2000     1    10
#> # … with 4,008 more rows

Created on 2020-06-17 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions