Sandy
Sandy

Reputation: 1148

Min and Max across multiple columns with NAs

For the following sample data dat, is there a way to calculate min and max while handling NAs. My input is:

dat <- read.table(text = "ID  Name   PM      TP2   Sigma
                          1   Tim    1       2    3
                          2   Sam    0       NA   1
                          3   Pam    2       1    NA
                          4   Ali    1       0    2
                          NA  NA     NA      NA   NA
                          6   Tim    2       0    7", header = TRUE)

My required output is:

ID  Name  PM      TP2   Sigma  Min  Max
1    Tim  1       2    3       1    3
2    Sam  0       NA   1       0    1
3    Pam  2       1    NA      1    2
4    Ali  1       0    2       0    2
NA   NA   NA      NA   NA      NA   NA
6    Tim  2       0    7       0    7

My Effort

1- I have seen similar posts but none of them has discussed issues where all entries in a column were NAs e.g., Get the min of two columns Based on this, I have tried pmin() and pmax(), but they do not work for me.

2- Another similar question is minimum (or maximum) value of each row across multiple columns. Again, there is no need to handle NAs.

3- Lastly, this question minimum (or maximum) value of each row across multiple columns talks about NA but not all elements in a column have missing values.

4- Also, some of the solutions require that the columns list to be included to be excluded is typed manually, my original data is quite wide, I want to have an easier solution where I can express columns by numbers rather than names.

Partial Solution

I have tried the following solution but Min column ends up having Inf and the Max column ends up having -Inf.

dat$min = apply(dat[,c(2:4)], 1, min, na.rm = TRUE)
dat$max = apply(dat[,c(2:4)], 1, max, na.rm = TRUE)

I can manually get rid of Inf by using something like:

dat$min[is.infinite(dat$min)] = NA

But I was wondering if there is a better way of achieving my desired outcome? Any advice would be greatly appreciated.

Thank you for your time.

Upvotes: 5

Views: 3635

Answers (4)

GKi
GKi

Reputation: 39657

On way might be to use pmin and pmax with do.call:

dat$min <- do.call(pmin, c(dat[,c(3:5)], na.rm=TRUE))
dat$max <- do.call(pmax, c(dat[,c(3:5)], na.rm=TRUE))
dat
#  ID Name PM TP2 Sigma min max
#1  1  Tim  1   2     3   1   3
#2  2  Sam  0  NA     1   0   1
#3  3  Pam  2   1    NA   1   2
#4  4  Ali  1   0     2   0   2
#5 NA <NA> NA  NA    NA  NA  NA
#6  6  Tim  2   0     7   0   7

Upvotes: 2

wiebke
wiebke

Reputation: 111

I would use data.table for this task. I use the rowSums to count the numbers of row with na and compare it to the number of columns in total. I just use in dat.new all columns where you have at least one nonNA value. Then you can use the na.rm=T as usually.

I hope this little code helps you.


library(data.table)

#your data
dat <- read.table(text = "ID    PM      TP2   Sigma
                          1      1       2    3
                  2      0       NA   1
                  3      2       1    NA
                  4      1       0    2
                  NA     NA      NA   NA
                  5      2       0    7", header = TRUE)

#generate data.table and add id
dat <- data.table(dat)
number.cols <- dim(dat)[2] #4
dat[,id:=c(1:dim(dat)[1])]
# > dat
#     ID PM TP2 Sigma id
# 1:  1  1   2     3  1
# 2:  2  0  NA     1  2
# 3:  3  2   1    NA  3
# 4:  4  1   0     2  4
# 5: NA NA  NA    NA  5
# 6:  5  2   0     7  6

#use new data.table to select all rows with at least one nonNA value
dat.new <- dat[rowSums(is.na(dat))<number.cols,]
dat.new[, MINv:=min(.SD, na.rm=T), by=id]
dat.new[, MAXv:=max(.SD, na.rm=T), by=id]

#if you need it merged to the old data
dat <- merge(dat, dat.new[,.(id,MINv,MAXv)], by="id")

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388907

You can use hablar's min_ and max_ function which returns NA if all values are NA.

library(dplyr)
library(hablar)

dat %>%
  rowwise() %>%
  mutate(min = min_(c_across(-ID)), 
         max = max_(c_across(-ID)))

You can also use this with apply -

cbind(dat, t(apply(dat[-1], 1, function(x) c(min = min_(x), max = max_(x)))))

#  ID PM TP2 Sigma min max
#1  1  1   2     3   1   3
#2  2  0  NA     1   0   1
#3  3  2   1    NA   1   2
#4  4  1   0     2   0   2
#5 NA NA  NA    NA  NA  NA
#6  5  2   0     7   0   7

Upvotes: 5

Sandy
Sandy

Reputation: 1148

The following solution seems to work with the transform() function:

dat <- transform(dat, min = pmin(PM, TP2, Sigma))
dat <- transform(dat, max = pmin(PM, TP2, Sigma))

Without using the transform() function, the data seemed to mess up. Also, the above command requires that all column names are written explicitly. I do not understand why writing a short version like below, fails.

pmin(dat[,2:4])) or
pmax(dat[,2:4]))

I am posting the only solution that I could come up with, in case someone else stumbles upon a similar issue.

Upvotes: 1

Related Questions