DaniCee
DaniCee

Reputation: 3217

Easiest way to replace non-NA values by column index

I have a data frame like the following, with some NAs:

mydf=data.frame(ID=LETTERS[1:10], aaa=runif(10), bbb=runif(10), ccc=runif(10), ddd=runif(10))
mydf[c(1,4,5,7:10),2]=NA
mydf[c(1,2,4:8),3]=NA
mydf[c(3,4,6:10),4]=NA
mydf[c(1,3,4,6,9,10),5]=NA

> mydf
   ID       aaa       bbb        ccc       ddd
1   A        NA        NA 0.08844614        NA
2   B 0.4912790        NA 0.88925139 0.1233173
3   C 0.1325188 0.1389260         NA        NA
4   D        NA        NA         NA        NA
5   E        NA        NA 0.60750723 0.6357998
6   F 0.8218579        NA         NA        NA
7   G        NA        NA         NA 0.5988206
8   H        NA        NA         NA 0.4008338
9   I        NA 0.8784563         NA        NA
10  J        NA 0.2959320         NA        NA

What I want to accomplish here is the following:

1- replace non-NA values by column index -1, so that the output looks like this:

> mydf
   ID       aaa       bbb        ccc       ddd
1   A        NA        NA          3        NA
2   B         1        NA          3         4
3   C         1         2         NA        NA
4   D        NA        NA         NA        NA
5   E        NA        NA          3         4
6   F         1        NA         NA        NA
7   G        NA        NA         NA         4
8   H        NA        NA         NA         4
9   I        NA         2         NA        NA
10  J        NA         2         NA        NA

2- Then I would like to add an extra column that shows the following:

The final result should look like this:

> mydf
   ID       aaa       bbb        ccc       ddd       final
1   A        NA        NA          3        NA           3
2   B         1        NA          3         4           0
3   C         1         2         NA        NA           0
4   D        NA        NA         NA        NA           0
5   E        NA        NA          3         4           0
6   F         1        NA         NA        NA           1
7   G        NA        NA         NA         4           4
8   H        NA        NA         NA         4           4
9   I        NA         2         NA        NA           2
10  J        NA         2         NA        NA           2

I could probably do all this with an ugly for loop, then aggregate for the final column, and substitute by 0 where appropriate...

But I was wondering if there would be a clean way to do this with some apply calls in just a few lines...

Thanks!

Upvotes: 3

Views: 250

Answers (3)

markus
markus

Reputation: 26373

A third option could be

tmp <- mydf[,-1]
tmp[!is.na(tmp)] <- 1
(mydf[,-1] <- tmp * as.list(1:4))
#   aaa bbb ccc ddd
#1   NA  NA   3  NA
#2    1  NA   3   4
#3    1   2  NA  NA
#4   NA  NA  NA  NA
#5   NA  NA   3   4
#6    1  NA  NA  NA
#7   NA  NA  NA   4
#8   NA  NA  NA   4
#9   NA   2  NA  NA
#10  NA   2  NA  NA

The final column can be generated like this

idx <- rowSums(tmp, na.rm = TRUE) == 1
mydf$final <- idx * max.col(replace(tmp, is.na(tmp), -Inf))

Result

mydf
#   ID aaa bbb ccc ddd final
#1   A  NA  NA   3  NA     3
#2   B   1  NA   3   4     0
#3   C   1   2  NA  NA     0
#4   D  NA  NA  NA  NA     0
#5   E  NA  NA   3   4     0
#6   F   1  NA  NA  NA     1
#7   G  NA  NA  NA   4     4
#8   H  NA  NA  NA   4     4
#9   I  NA   2  NA  NA     2
#10  J  NA   2  NA  NA     2

Upvotes: 0

Sotos
Sotos

Reputation: 51592

Here is an idea,

mydf1 <- cbind.data.frame(ID = mydf$ID, mapply(function(x, y) replace(x, !is.na(x), y), 
                                               mydf, 
                                               seq(ncol(mydf)) - 1)[,-1])

mydf1$final <- apply(mydf1[-1], 1, \(i) 
       ifelse(sum(is.na(i)) == (ncol(mydf) - 1) | sum(!is.na(i)) > 1, 0, i[!is.na(i)]))

mydf1
   ID  aaa  bbb  ccc  ddd final
1   A <NA> <NA>    3 <NA>     3
2   B    1 <NA>    3    4     0
3   C    1    2 <NA> <NA>     0
4   D <NA> <NA> <NA> <NA>     0
5   E <NA> <NA>    3    4     0
6   F    1 <NA> <NA> <NA>     1
7   G <NA> <NA> <NA>    4     4
8   H <NA> <NA> <NA>    4     4
9   I <NA>    2 <NA> <NA>     2
10  J <NA>    2 <NA> <NA>     2

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 174506

You could do:

mydf[-1] <- sapply(1:4, \(x) x * mydf[x+1]/mydf[x+1])

mydf$final <- apply(mydf[-1], 1, function(x) { 
  if(all(is.na(x)) | sum(!is.na(x)) > 1) 0 else na.omit(x)
})

Result:

mydf
#>    ID aaa bbb ccc ddd final
#> 1   A  NA  NA   3  NA     3
#> 2   B   1  NA   3   4     0
#> 3   C   1   2  NA  NA     0
#> 4   D  NA  NA  NA  NA     0
#> 5   E  NA  NA   3   4     0
#> 6   F   1  NA  NA  NA     1
#> 7   G  NA  NA  NA   4     4
#> 8   H  NA  NA  NA   4     4
#> 9   I  NA   2  NA  NA     2
#> 10  J  NA   2  NA  NA     2

Created on 2022-12-16 with reprex v2.0.2

Upvotes: 3

Related Questions