SRel
SRel

Reputation: 423

'Interpolation' of a missing date/value in R?

I have a dataframe like so:

Month         CumulativeSum
2019-02-01    40
2019-03-01    70
2019-04-01    80
2019-07-01    100
2019-08-01    120

Problem is that nothing happen in May and June, hence there is no data. Plotting this in barcharts results in some empty space on the x-axis. Is there some way to "fill" the missing spot like so, using the last known value?:

Month         CumulativeSum
2019-02-01    40
2019-03-01    70
2019-04-01    80
**2019-05-01    80**  <--
**2019-06-01    80**  <--
2019-07-01    100
2019-08-01    120

Upvotes: 2

Views: 133

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101064

Here is a base R option using cummax

transform(
  data.frame(
    Month = seq(min(df$Month), max(df$Month), by = "1 month"),
    CumulativeSum = -Inf
  ),
  CumulativeSum = cummax(replace(CumulativeSum, Month %in% df$Month, df$CumulativeSum))
)

which gives

       Month CumulativeSum
1 2019-02-01            40
2 2019-03-01            70
3 2019-04-01            80
4 2019-05-01            80
5 2019-06-01            80
6 2019-07-01           100
7 2019-08-01           120

Upvotes: 2

akrun
akrun

Reputation: 886948

We can use complete

library(dplyr)
library(tidyr)
df1 %>%
  complete(Month = seq(min(Month), max(Month), by = '1 month')) %>%
  fill(CumulativeSum)

-output

# A tibble: 7 x 2
#  Month      CumulativeSum
#  <date>             <int>
#1 2019-02-01            40
#2 2019-03-01            70
#3 2019-04-01            80
#4 2019-05-01            80
#5 2019-06-01            80
#6 2019-07-01           100
#7 2019-08-01           120

data

df1 <- structure(list(Month = structure(c(17928, 17956, 17987, 18078, 
18109), class = "Date"), CumulativeSum = c(40L, 70L, 80L, 100L, 
120L)), row.names = c(NA, -5L), class = "data.frame")

Upvotes: 3

Related Questions