Hard_Course
Hard_Course

Reputation: 311

Replace the last non NA entry of a row with the value from another column

I have a dataset like this:

  tdf <- data.frame(
    A = c(1, 2, 5, 4),
    B = c(NA, 3, 4, 5),
    C = c(NA, NA, NA, NA),
    D = c(NA, NA, NA, NA),
    E = c(10, 20, 30, 40))

And I need to replace the last non NA value going from left to right, with a value from another column, say E, to give this:

  enddf <- data.frame(
    A = c(10, 2, 5, 4),
    B = c(NA, 20, 30, 40),
    C = c(NA, NA, NA, NA),
    D = c(NA, NA, NA, NA),
    E = c(10, 20, 30, 40))

I'm reading multiple dfs of varying numbers of columns, so I'm thinking a function which reads across all rows but the last one would work?

I tried something like this, but I feel like I'm going the wrong direction:

df <- tdf %>%
    rowwise() %>%
    mutate(
      A = ifelse(is.na(A), A, ifelse(!is.na(B), C, A)),
      B = ifelse(!is.na(A) & is.na(B), C, B)
    )

Upvotes: 6

Views: 191

Answers (6)

G. Grothendieck
G. Grothendieck

Reputation: 270045

Let ix be the column number before the first NA, one per row. If there are no NA's then return 1 via the nomatch argument of match which becomes 0 after subtracting 1. In the first test below ix is c(1, 2, 2, 2) and in the second is c(1, 2, 2, 0, 0). Then insert tdf$E into successive rows such that the first element of ix corresponds to row 1, the second to row 2 and so on. An ix element of 0 means skip the insertion for that row so that that row remains unchanged.

(The code works whether data is a data frame or matrix but if it were known to always be a matrix then [ix>0, ] in the second line could optionally be omitted due to a subtle difference between matrix and data frame indexing.)

This solution has a number of advantages:

  • works for all examples provided. See tests.
  • inserts into a data frame rather than into a matrix which must then be converted.
  • is compact having only two lines of code
  • does not overwrite input which in general is good practice

Code

fillin <- function(data, col = ncol(data)) {
  ix <- apply(data, 1, match, x = NA, nomatch = 1) - 1
  replace(data, cbind(seq_along(ix), ix)[ix>0,], data[ix>0, col])
}

# tests 

identical(fillin(tdf), enddf)  # inputs from question
## [1] TRUE

identical(fillin(tdf2), enddf2)  # inputs from Note below
## [1] TRUE     

Note

Input used

tdf2 <- data.frame(
  A = c(1, 2, 5, NA,1),
  B = c(NA, 3, 4, NA,1),
  C = c(NA, NA, NA, NA,1),
  D = c(NA, NA, NA, NA,1),
  E = c(10, 20, 30, 40,1))

enddf2 <- data.frame(
  A = c(10, 2, 5, NA, 1),
  B = c(NA, 20, 30, NA, 1),
  C = c(NA, NA, NA, NA, 1),
  D = c(NA, NA, NA, NA, 1),
  E = c(10, 20, 30, 40, 1))

Update

Removed stray period in code and enhanced to support both the test example in the question and the new one in the comment under this question. Also made a number of other small changes.

Upvotes: -1

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6529

You could solve your problem as follows:

ij = which(is.na(tdf), arr.ind=TRUE)
ij = ij[!duplicated(ij[,1]) & ij[,2]>1,]
ij[, 2] = ij[, 2]-1
tdf[ij] = tdf$E[ij[,1]]

Upvotes: 0

Rui Barradas
Rui Barradas

Reputation: 76641

A run-length encoding rle can give you the last non-NA on each row. Add 1 and use as the column number in an index matrix.

tdf <- data.frame(
  A = c(1, 2, 5, 4),
  B = c(NA, 3, 4, 5),
  C = c(NA, NA, NA, NA),
  D = c(NA, NA, NA, NA),
  E = c(10, 20, 30, 40))

inx <- apply(tdf, 1L, \(x) {
  i <- rle(is.na(x))$lengths[1L]
  ifelse(i < nrow(tdf), i, NA_integer_)
})
inx <- cbind(seq_len(nrow(tdf)), inx)
tdf[inx] <- tdf$E
tdf
#>    A  B  C  D  E
#> 1 10 NA NA NA 10
#> 2  2 20 NA NA 20
#> 3  5 30 NA NA 30
#> 4  4 40 NA NA 40

Created on 2025-02-21 with reprex v2.1.1

Upvotes: 4

tmfmnk
tmfmnk

Reputation: 40171

One tidyverse option could be:

tdf %>%
 mutate(pmap_dfr(across(c(A:D, E)), 
                 ~ `[<-`(c(...), max(which(!is.na(head(c(...), -1)))), tail(c(...), 1))))
                 
   A  B  C  D  E
1 10 NA NA NA 10
2  2 20 NA NA 20
3  5 30 NA NA 30
4  4 40 NA NA 40

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 102529

You can use max.col + is.na like below

> tdf[cbind(1:nrow(tdf), max.col(is.na(tdf), "first") - 1)] <- tdf$E

> tdf
   A  B  C  D  E
1 10 NA NA NA 10
2  2 20 NA NA 20
3  5 30 NA NA 30
4  4 40 NA NA 40

Upvotes: 4

Andre Wildberg
Andre Wildberg

Reputation: 19191

A base R approach that looks for NA row-wise with apply.

wcol <- ncol(tdf)

data.frame(t(apply(tdf, 1, \(x) {
  x[which(is.na(x))[1] - 1] <- x[wcol]
  x
})))

output

   A  B  C  D  E
1 10 NA NA NA 10
2  2 20 NA NA 20
3  5 30 NA NA 30
4  4 40 NA NA 40

Since the values are all numeric no conversion occurs with the intermediate matrix result.

Upvotes: 6

Related Questions