Anthony O'Brien
Anthony O'Brien

Reputation: 73

In R: How to replace NA in a Vector found between two integers

I have the following vector:

A:(NA NA NA NA 1 NA NA 4 NA NA 1 NA NA NA NA NA 4 NA 1 NA 4)

I would like to replace all the Nas between 1 and 4 with 2 (but not the Nas between 4 and 1)

Are there any approaches you would recommend/use for this task?

It may also be managed as a dataframe:

 A 
----
 NA 
 NA 
 NA 
 NA 
 1 
 NA 
 NA 
 4 
 NA 
 NA 
 1 
 NA 
 NA 
 NA 
 NA 
 NA
 4 
 NA 
 1
 NA 
 4
----

Edit: 1. I changed the string "Na" to NA.

SOLUTION/UPDATE Thank you to everyone for your insights. I learnt from them to come up with the following solution to my case. I hope it is useful to someone else:

A <- c(df$A)

index.1<-which(df$A %in% c(1)) # define location for 1s in A
index.14<-which(df$A %in% c(1,4)) # define location for 1s and 4s in A

loc.1<-which(index.14 %in% index.1) # location of 1s in  index.14
loc.4<-loc.1+1 # location of 4s relative to 1s in index.14

start.i<-((index.14[loc.1])+1) # starting index for replacing with 2
end.i<-((index.14[loc.4])-1) # ending index for replacing with 2 in index

fill.v<-sort(c(start.i, end.i))# sequence of indexes to fill-in with # 2

# create matrix of beginning and ending sequence
fill.m<-matrix(fill.v,nrow = (length(fill.v)/2),ncol = 2, byrow=TRUE) 

# create a list with indexes to replace
list.1<-apply(fill.m, MARGIN=1,FUN=function(x) seq(x[1],x[2])) 

# unlist list to use as the indexes for replacement
list.2<-unlist(list.1) 

df$A[list.2] <- 2 # replace indexed location with 2

Upvotes: 5

Views: 345

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 270348

Assuming A is as shown reproducibly in the Note at the end, the difference of cumsum's shown gives TRUE for the elements between 1 and 4 inclusive and the next condition eliminates the endpoints. Finally we replace the positions having TRUE in what is left with 2.

replace(A, (cumsum(A == 1) - cumsum(A == 4)) & (A == "Na"), 2)

giving:

 [1] "Na" "Na" "Na" "Na" "1"  "2"  "2"  "4"  "Na" "Na" "1"  "2"  "2"  "2"  "2" 
[16] "2"  "4"  "Na" "1"  "2"  "4"

NA values

R is case sensitive and Na is not the same as NA. The sample data in the question showed Na values and not NA values but if what was actually meant was a numeric vector with NA values as in AA in the Note below then modify the expression to be as shown here:

replace(AA, cumsum(!is.na(AA) & AA == 1) - cumsum(!is.na(AA) & AA == 4) & is.na(AA), 2)

giving:

[1] NA NA NA NA  1  2  2  4 NA NA  1  2  2  2  2  2  4 NA  1  2  4

Note

A <- c("Na", "Na", "Na", "Na", "1", "Na", "Na", "4", "Na", "Na", 
"1", "Na", "Na", "Na", "Na", "Na", "4", "Na", "1", "Na", "4")

AA <- as.numeric(replace(A, A == "Na", NA))

Upvotes: 5

Andrew
Andrew

Reputation: 5138

This should work as well and I assumed you were referring to NA's and no the string "Na". It would work for either though (or a mix).

> A <- c(NA, NA, NA, NA, 1, NA, NA, 4, NA, NA, 1, NA, NA, NA, NA, NA, 4, NA, 1, NA, 4)
> 
> btw_1_4 <- unlist(lapply(Map(`:`, which(A == 1), which(A == 4)), function(x) x[2:(length(x)-1)]))
> 
> A[btw_1_4] <- 2
> 
> A
 [1] NA NA NA NA  1  2  2  4 NA NA  1  2  2  2  2  2  4 NA  1  2  4

Map(:, which(A == 1), which(A == 4))

Creates a list of positions for 1-4 ranges in the vector (in order)

lapply(Map_List, function(x) x[2:(length(x)-1)]) Removes the first and last element of each vector in the list (the position of 1 and 4)

unlist makes all the remaining positions (NA's between 1 and 4) a single vector

Upvotes: 1

f.lechleitner
f.lechleitner

Reputation: 3812

I'm sure there's a better solution to this problem but this should do the trick:

A <-
  c(NA, NA, NA, NA, 1, NA, NA, 4, NA, NA, 1, NA, NA, NA, NA, NA, 4, NA, 1, NA, 4)

replace <- FALSE

for (i in 1:length(A)) {
  if (!is.na(A[i])) {
    if (A[i] == 1) {
      start <- i + 1
      replace <- TRUE
    }
    if (A[i] == 4 & replace == TRUE) {
      A[start:(i - 1)] <- 2
      replace <- FALSE
    }
  }
}

EDIT: if you only want to replace the NAs if there's nothing else (for example a 3) between the 1 and the 3 you could use this:

A <-
  c(NA, NA, NA, NA, 1, NA, 3, 4, NA, NA, 1, NA, NA, NA, NA, NA, 4, NA, 1, NA, 4)

replace <- FALSE

for (i in 1:length(A)) {
  if (!is.na(A[i])) {
    if (A[i] == 1) {
      start <- i + 1
      replace <- TRUE
    }
    if (A[i] == 4 & replace == TRUE) {
      A[start:(i - 1)] <- 2
      replace <- FALSE
    }
    if (A[i] != 4 & A[i] != 1){
      replace <- FALSE
    }
  }
}

Output:

> A
 [1] NA NA NA NA  1 NA  3  4 NA NA  1  2  2  2  2  2  4 NA  1  2  4

And if you only want to replace NAs but keep other values between 1 and 4 use this:

A <-
  c(NA, NA, NA, NA, 1, NA, 3, 4, NA, NA, 1, NA, NA, NA, NA, NA, 4, NA, 1, NA, 4)

replace <- FALSE

for (i in 1:length(A)) {
  if (!is.na(A[i])) {
    if (A[i] == 1) {
      start <- i + 1
      replace <- TRUE
    }
    if (A[i] == 4 & replace == TRUE) {
      sub <- A[start:(i - 1)]
      sub[is.na(sub)] <- 2
      A[start:(i - 1)] <- sub
      replace <- FALSE
    }
  }
}

Output:

> A
 [1] NA NA NA NA  1  2  3  4 NA NA  1  2  2  2  2  2  4 NA  1  2  4

Upvotes: 2

Related Questions