Reputation: 1148
For the following sample data dat
, is there a way to calculate min
and max
while handling NA
s. My input is:
dat <- read.table(text = "ID Name PM TP2 Sigma
1 Tim 1 2 3
2 Sam 0 NA 1
3 Pam 2 1 NA
4 Ali 1 0 2
NA NA NA NA NA
6 Tim 2 0 7", header = TRUE)
My required output is:
ID Name PM TP2 Sigma Min Max
1 Tim 1 2 3 1 3
2 Sam 0 NA 1 0 1
3 Pam 2 1 NA 1 2
4 Ali 1 0 2 0 2
NA NA NA NA NA NA NA
6 Tim 2 0 7 0 7
My Effort
1- I have seen similar posts but none of them has discussed issues where all entries in a column were NA
s e.g., Get the min of two columns
Based on this, I have tried pmin()
and pmax()
, but they do not work for me.
2- Another similar question is minimum (or maximum) value of each row across multiple columns. Again, there is no need to handle NA
s.
3- Lastly, this question minimum (or maximum) value of each row across multiple columns talks about NA
but not all elements in a column have missing values.
4- Also, some of the solutions require that the columns list to be included to be excluded is typed manually, my original data is quite wide
, I want to have an easier solution where I can express columns by numbers rather than names.
Partial Solution
I have tried the following solution but Min
column ends up having Inf
and the Max
column ends up having -Inf
.
dat$min = apply(dat[,c(2:4)], 1, min, na.rm = TRUE)
dat$max = apply(dat[,c(2:4)], 1, max, na.rm = TRUE)
I can manually get rid of Inf
by using something like:
dat$min[is.infinite(dat$min)] = NA
But I was wondering if there is a better way of achieving my desired outcome? Any advice would be greatly appreciated.
Thank you for your time.
Upvotes: 5
Views: 3635
Reputation: 39657
On way might be to use pmin
and pmax
with do.call
:
dat$min <- do.call(pmin, c(dat[,c(3:5)], na.rm=TRUE))
dat$max <- do.call(pmax, c(dat[,c(3:5)], na.rm=TRUE))
dat
# ID Name PM TP2 Sigma min max
#1 1 Tim 1 2 3 1 3
#2 2 Sam 0 NA 1 0 1
#3 3 Pam 2 1 NA 1 2
#4 4 Ali 1 0 2 0 2
#5 NA <NA> NA NA NA NA NA
#6 6 Tim 2 0 7 0 7
Upvotes: 2
Reputation: 111
I would use data.table for this task. I use the rowSums to count the numbers of row with na and compare it to the number of columns in total. I just use in dat.new all columns where you have at least one nonNA value. Then you can use the na.rm=T as usually.
I hope this little code helps you.
library(data.table)
#your data
dat <- read.table(text = "ID PM TP2 Sigma
1 1 2 3
2 0 NA 1
3 2 1 NA
4 1 0 2
NA NA NA NA
5 2 0 7", header = TRUE)
#generate data.table and add id
dat <- data.table(dat)
number.cols <- dim(dat)[2] #4
dat[,id:=c(1:dim(dat)[1])]
# > dat
# ID PM TP2 Sigma id
# 1: 1 1 2 3 1
# 2: 2 0 NA 1 2
# 3: 3 2 1 NA 3
# 4: 4 1 0 2 4
# 5: NA NA NA NA 5
# 6: 5 2 0 7 6
#use new data.table to select all rows with at least one nonNA value
dat.new <- dat[rowSums(is.na(dat))<number.cols,]
dat.new[, MINv:=min(.SD, na.rm=T), by=id]
dat.new[, MAXv:=max(.SD, na.rm=T), by=id]
#if you need it merged to the old data
dat <- merge(dat, dat.new[,.(id,MINv,MAXv)], by="id")
Upvotes: 1
Reputation: 388907
You can use hablar
's min_
and max_
function which returns NA
if all values are NA
.
library(dplyr)
library(hablar)
dat %>%
rowwise() %>%
mutate(min = min_(c_across(-ID)),
max = max_(c_across(-ID)))
You can also use this with apply
-
cbind(dat, t(apply(dat[-1], 1, function(x) c(min = min_(x), max = max_(x)))))
# ID PM TP2 Sigma min max
#1 1 1 2 3 1 3
#2 2 0 NA 1 0 1
#3 3 2 1 NA 1 2
#4 4 1 0 2 0 2
#5 NA NA NA NA NA NA
#6 5 2 0 7 0 7
Upvotes: 5
Reputation: 1148
The following solution seems to work with the transform()
function:
dat <- transform(dat, min = pmin(PM, TP2, Sigma))
dat <- transform(dat, max = pmin(PM, TP2, Sigma))
Without using the transform()
function, the data seemed to mess up. Also, the above command requires that all column names are written explicitly. I do not understand why writing a short version like below, fails.
pmin(dat[,2:4])) or
pmax(dat[,2:4]))
I am posting the only solution that I could come up with, in case someone else stumbles upon a similar issue.
Upvotes: 1