perf2k
perf2k

Reputation: 43

R: In dataframe: set first non-NA value in column to NA

I have a large dataframe, 300+ columns (time series) with about 2600 observations. The columns are filled with a lot of NA's and then a short time series, and then typically NA's again. I would like to find the first non-NA value in each column and replace it with NA.

This is what I'm hoping to achieve, only with a much bigger dataframe:

Before:

   x1 x2 x3 x4
1  NA NA NA NA
2  NA NA NA NA
3   1  1 NA NA
4   2  2  1  1
5   3  3  2  2
6   4  4  3  3
7   5  5  4  4
8   6  6  5  5
9   7  7  6  6
10  8  8  7  7
11  9  9 NA NA
12 10 10 NA NA
13 NA NA NA NA
14 NA NA NA NA

After:

   x1 x2 x3 x4
1  NA NA NA NA
2  NA NA NA NA
3  NA NA NA NA
4   2  2 NA NA
5   3  3  2  2
6   4  4  3  3
7   5  5  4  4
8   6  6  5  5
9   7  7  6  6
10  8  8  7  7
11  9  9 NA NA
12 10 10 NA NA
13 NA NA NA NA
14 NA NA NA NA

I've searched around and found a way to do this for each column, but my efforts to apply it to the whole dataframe has proven difficult.

I have created an example dataframe to reproduce my original dataframe:

#Dataframe with NA
x1=x2=c(NA,NA,1:10,NA,NA)
x3=x4=c(NA,NA,NA,1:7,NA,NA,NA,NA)
df=data.frame(x1,x2,x3,x4)

I have used this to replace the first value with NA in 1 column (provided by @Joshua Ulrich here), however I would like to apply it to all columns without manually changing 300+ codes:

NonNAindex <- which(!is.na(df[,1]))
firstNonNA <- min(NonNAindex)
is.na(df[,1]) <- seq(firstNonNA, length.out=1)

I have tried to set the above as a function and run it for all columns with apply/lapply, as well as a for loop, but haven't really figured out how to apply the changes to my dataframe. I'm sure there is something I've completely overlooked as I'm just taking my first small steps in R.

All suggestions would be highly appreciated!

Upvotes: 4

Views: 1161

Answers (2)

akrun
akrun

Reputation: 886938

We can use base R

df1[] <- lapply(df1, function(x) replace(x, which(!is.na(x))[1], NA))
df1
#   x1 x2 x3 x4
#1  NA NA NA NA
#2  NA NA NA NA
#3  NA NA NA NA
#4   2  2 NA NA
#5   3  3  2  2
#6   4  4  3  3
#7   5  5  4  4
#8   6  6  5  5
#9   7  7  6  6
#10  8  8  7  7
#11  9  9 NA NA
#12 10 10 NA NA
#13 NA NA NA NA
#14 NA NA NA NA

Or as @thelatemail suggested

df1[] <- lapply(df1, function(x) replace(x, Position(Negate(is.na), x), NA))

Upvotes: 5

markdly
markdly

Reputation: 4534

Since you would like to do this for all columns, you could use the mutate_all function from dplyr. See http://dplyr.tidyverse.org/ for more information. In particular, you may want to look at some of the examples shown here.

library(dplyr)
mutate_all(df, funs(if_else(row_number() == min(which(!is.na(.))), NA_integer_, .)))
#>    x1 x2 x3 x4
#> 1  NA NA NA NA
#> 2  NA NA NA NA
#> 3  NA NA NA NA
#> 4   2  2 NA NA
#> 5   3  3  2  2
#> 6   4  4  3  3
#> 7   5  5  4  4
#> 8   6  6  5  5
#> 9   7  7  6  6
#> 10  8  8  7  7
#> 11  9  9 NA NA
#> 12 10 10 NA NA
#> 13 NA NA NA NA
#> 14 NA NA NA NA

Upvotes: 1

Related Questions