Andrew Eaves
Andrew Eaves

Reputation: 218

Replace NA values in dataframe starting in varying columns

This is a variation on the NA theme that I have not been able to find an answer to. I have monthly observations by column with a large number of series by row. Some missing values are genuine but some should be zero. I want to replace missing values for a given series with zeros but only after a value for that series has been observed.

For example, given:

   Mth1 Mth2 Mth3 Mth4
1    1    2    1    3
2    NA   3    2    1
3    NA   2    1   NA
4    NA   NA   2   NA
5    2    2    NA   2

I want to change this to:

   Mth1 Mth2 Mth3 Mth4
1    1    2    1    3
2    NA   3    2    1
3    NA   2    1    0
4    NA   NA   2    0
5    2    2    0    2

I want something like the locf function, which is able to leave missing values prior to the first positive observation, but I want to fill with zeros rather than use the last observation.

Upvotes: 0

Views: 270

Answers (2)

akrun
akrun

Reputation: 887891

We can use apply with MARGIN = 1, find the position of the first non-NA element, get the sequence from that to the last element, subset the rows and replace the NA elements to 0.

df1[] <- t(apply(df1, 1,  function(x) {
        i1 <- which(!is.na(x))[1]:length(x)
        x[i1] <- replace(x[i1], is.na(x[i1]), 0)
        x}))
df1
#    Mth1 Mth2 Mth3 Mth4
#1    1    2    1    3
#2   NA    3    2    1
#3   NA    2    1    0
#4   NA   NA    2    0
#5    2    2    0    2

Another option is regex based. We paste the rows together, replace the NA elements that follows a number and space with 0 and read it with read.table

read.table(text=gsub("(?<=[0-9])\\s+NA", " 0", 
   do.call(paste, df1), perl = TRUE), header=FALSE, col.names  = names(df1))
#   Mth1 Mth2 Mth3 Mth4
#1    1    2    1    3
#2   NA    3    2    1
#3   NA    2    1    0
#4   NA   NA    2    0
#5    2    2    0    2

data

df1 <- structure(list(Mth1 = c(1L, NA, NA, NA, 2L), Mth2 = c(2L, 3L, 
2L, NA, 2L), Mth3 = c(1L, 2L, 1L, 2L, NA), Mth4 = c(3L, 1L, NA, 
NA, 2L)), .Names = c("Mth1", "Mth2", "Mth3", "Mth4"), class = "data.frame", 
 row.names = c("1", "2", "3", "4", "5"))

Upvotes: 1

lmo
lmo

Reputation: 38520

Here is another base R method using matrix indexing:

df[is.na(df) & t(apply(!is.na(df), 1, cummax))] <- 0
df
  Mth1 Mth2 Mth3 Mth4
1    1    2    1    3
2   NA    3    2    1
3   NA    2    1    0
4   NA   NA    2    0
5    2    2    0    2

is.na(df) returns a logical matrix indicating the location of NA values. This is (logically) chained to t(apply(!is.na(df), 1, cummax)) which indicates if a non-NA value occurred in a previous row element. elements of the data.frame for which both of these are TRUE are replaced with 0.

Upvotes: 4

Related Questions