Jules.Sanchez
Jules.Sanchez

Reputation: 49

R: replacing NA values between two specific values by row

I am trying to get my data ready for a later sequence analysis; to do this I need to insert the value 1 between the values 1 and 2. So all the NA's between the values of 1 and 2 become equal to 1 for each row. I've included an example table below; in my actual data each row is a unique individual and each column is a time period. The 1 represents admission, the 2 discharge from a program. I am attempting to get the periods between 'admission' and 'discharge' to equal 1 to signify being enrolled in the program and then setting the remaining NA's to 0 to signify not being in a program. There can be multiple admissions per row/individual.

I've been trying to use apply, with which I can change the values themselves, however I can't replace the NA's between the values 1 and 2. Any guidance would be much appreciated!

mdat <- matrix(c(1,NA,NA,NA,2,NA,NA,1,NA,2,  NA,NA,1,2,NA,NA,NA,1,NA,2), nrow = 2, ncol=10, byrow=TRUE,
           dimnames = list(c("row1", "row2"), c("C.1", "C.2", "C.3", "C.4", "C.5", "C.6", "C.7", "C.8", "C.9", "C.10")))

|      | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8| c9 | c10 |
----------------------------------------------------------
|row 1 | 1  | NA | NA | NA | 2  | NA | NA | 1 | NA | 2   |
|row 2 | NA | NA | 1  | 2  | NA | NA | NA | 1 | NA | 2   |

the desired result;

|      | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8| c9 | c10 |
----------------------------------------------------------
|row 1 | 1  | 1  | 1  | 1  | 2  | NA | NA | 1 | 1  | 2   |
|row 2 | NA | NA | 1  | 2  | NA | NA | NA | 1 | 1  | 2   |

Upvotes: 1

Views: 1147

Answers (3)

Carl Witthoft
Carl Witthoft

Reputation: 21502

EDIT: completely different answer. I'm not completely clear on what the OP wants, but this code, albeit slowly, will replace all NA immediately following a 1 with 1 . I'm posting just so that anyone with free time and a copy of microbenchmark can see how much better na.locf does.

foo <- c(1,NA,2,NA,1,2,1,NA,NA,NA,2,NA,NA)
foo
length(foo)
for(jj in 2:length(foo) ) {
if ( (!is.na(foo[jj-1]) && foo[jj-1]==1) & is.na(foo[jj])) foo[jj]=1
}
foo
#then replace remaining `NA` with zero if desired

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 269491

1) We can get a relatively compact solution by using na.locf from the zoo package to fill in the NAs and then replacing elements of mdat corresponding to 1s in the filled in version with 1:

library(zoo)

replace(mdat, t(na.locf(t(mdat))) == 1, 1)

giving:

     C.1 C.2 C.3 C.4 C.5 C.6 C.7 C.8 C.9 C.10
row1   1   1   1   1   2  NA  NA   1   1    2
row2  NA  NA   1   2  NA  NA  NA   1   1    2

2) Alternately use na.locf and replace any propagated 2s which are NA in mdat with NA. We use a dplyr pipeline (although this could be eliminated if desired):

library(dplyr)
library(zoo)

mdat %>% t %>% na.locf %>% t %>% replace(. == 2 & is.na(mdat), NA)

Upvotes: 2

Cath
Cath

Reputation: 24074

If I understand correctly what you need, you can first replace the NA by 0, then search for occurrences of 2 compare to 1, on a byrow basis, to fill the "gaps" with 1 (the result is then transposed in order to keep the former format):

mdat[is.na(mdat)] <- 0
mdat <- t(apply(mdat, 1, function(x) {x[cumsum(x==2) < cumsum(x==1)] <- 1; x}))
mdat
#     C.1 C.2 C.3 C.4 C.5 C.6 C.7 C.8 C.9 C.10
#row1   1   1   1   1   2   0   0   1   1    2
#row2   0   0   1   2   0   0   0   1   1    2

Upvotes: 4

Related Questions