Nick Knauer
Nick Knauer

Reputation: 4243

Apply na.fill to every column

I have a dataset that looks like this:

Col1    Col2     Col3     Col4    Col5   
   A       B        4        5       7
   G       H        5        6      NA
   H       I       NA        9       8
   K       F        9       NA      NA
   E       L       NA        8       9
   H       I        1        0      10

How do I apply the na.fill() function to all the columns after Col2?

If I were to do it individually, it would be something like this:

df$Col3<-na.fill(df$Col3, c(NA, "extend", NA))
df$Col4<-na.fill(df$Col4, c(NA, "extend", NA))
df$Col5<-na.fill(df$Col5, c(NA, "extend", NA))  

The problem is that my actual dataframe has over 100 columns. Is there a quick way to apply this function to all the columns after the first 2?

Upvotes: 1

Views: 503

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269885

na.fill does handle multiple columns. Really no need to use lapply, mutate, etc. Just replace the relevant columns with the result of running na.fill on those same columns. If you know what ix is then you could replace the first line with it so that in this example we could alternately use ix <- 3:5 or ix <- -(1:2) .

ix <- sapply(DF, is.numeric)
replace(DF, ix, na.fill(DF[ix], c(NA, "extend", NA)))

giving:

  Col1 Col2 Col3 Col4 Col5
1    A    B    4  5.0  7.0
2    G    H    5  6.0  7.5
3    H    I    7  9.0  8.0
4    K    F    9  8.5  8.5
5    E    L    5  8.0  9.0
6    H    I    1  0.0 10.0

Note that you could alternately use na.approx:

replace(DF, ix, na.approx(DF[ix], na.rm = FALSE))

Note

Lines <- "Col1    Col2     Col3     Col4    Col5   
   A       B        4        5       7
   G       H        5        6      NA
   H       I       NA        9       8
   K       F        9       NA      NA
   E       L       NA        8       9
   H       I        1        0      10"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE, strip.white = TRUE)

Upvotes: 1

Michael Herman
Michael Herman

Reputation: 41

The mutate_-family of functions in the dplyr package would do the trick.

There are a few ways to do this. Some may work better than others depending on what your other columns look like. Here are three versions that would work better in different circumstances.

# Make dummy data.
df <- data.frame(
    Col1 = LETTERS[1:6],
    Col2 = LETTERS[7:12],
    Col3 = c(4, 5, NA, 9, NA, 1),
    Col4 = c(5,6,9,NA,8,0),
    Col5 = c(7,NA,8,NA,9,10)
)

You can apply the na.fill function to columns specified by name vector. This is useful if you want to use a regular expression to select columns with certain name parts.

cn <- names(df) %>%
    str_subset("[345]")    # Column names with 3, 4 or 5 in them.

result_1 <- df %>% 
    mutate_at(vars(cn),
        zoo::na.fill, c(NA, 'extend', NA)
        )

You can apply the na.fill function to any numeric column.

result_2 <- df %>%
    mutate_if(is.numeric,    # First argument is function that returns a logical vector.
        zoo::na.fill, c(NA, 'extend', NA)
        )

You can apply the function to columns specified in an numeric index vector.

result_3 <- df
result_3[ , 3:5] <- result_3[ , 3:5] %>%    # Just replace columns 3 through 5
    mutate_all(
        zoo::na.fill, c(NA, 'extend', NA)
        )

In this case, all three versions should have done the same thing.

all.equal(result_1, result_2)    # TRUE
all.equal(result_1, result_3)    # TRUE

Upvotes: 0

Related Questions