Replace NA values in dataframe starting in varying columns

Question

This is a variation on the NA theme that I have not been able to find an answer to. I have monthly observations by column with a large number of series by row. Some missing values are genuine but some should be zero. I want to replace missing values for a given series with zeros but only after a value for that series has been observed.

For example, given:

   Mth1 Mth2 Mth3 Mth4
1    1    2    1    3
2    NA   3    2    1
3    NA   2    1   NA
4    NA   NA   2   NA
5    2    2    NA   2

I want to change this to:

   Mth1 Mth2 Mth3 Mth4
1    1    2    1    3
2    NA   3    2    1
3    NA   2    1    0
4    NA   NA   2    0
5    2    2    0    2

I want something like the locf function, which is able to leave missing values prior to the first positive observation, but I want to fill with zeros rather than use the last observation.

lmo · Accepted Answer

Here is another base R method using matrix indexing:

df[is.na(df) & t(apply(!is.na(df), 1, cummax))] <- 0
df
  Mth1 Mth2 Mth3 Mth4
1    1    2    1    3
2   NA    3    2    1
3   NA    2    1    0
4   NA   NA    2    0
5    2    2    0    2

is.na(df) returns a logical matrix indicating the location of NA values. This is (logically) chained to t(apply(!is.na(df), 1, cummax)) which indicates if a non-NA value occurred in a previous row element. elements of the data.frame for which both of these are TRUE are replaced with 0.

Replace NA values in dataframe starting in varying columns

Answers (2)

data

Related Questions