Duck
Duck

Reputation: 39595

How to extract maximal value from a group of variables in a data frame

I am having problems with function max() in the extraction of maximal value from a group of variables. The data.frame is the next and all variables are numeric:

setosa  versicolor  virginica
    0   0.96969697  0.03030303
    0   0.05128205  0.94871795
    0   0.96969697  0.03030303
    1   0.00000000  0.00000000
    1   0.00000000  0.00000000
    0   0.05128205  0.94871795
    0   0.05128205  0.94871795
    0   0.05128205  0.94871795

When I apply max() function to this data frame and I try to save it in a new variable it happens:

DF$max=max(DF$setosa,DF$versicolor,DF$virginica)

setosa  versicolor  virginica   max
    0   0.96969697  0.03030303  1
    0   0.05128205  0.94871795  1
    0   0.96969697  0.03030303  1
    1   0.00000000  0.00000000  1
    1   0.00000000  0.00000000  1
    0   0.05128205  0.94871795  1
    0   0.05128205  0.94871795  1
    0   0.05128205  0.94871795  1

It seems max() function round the maximal value. I can't find my mistake, can you help me what is wrong. Thanks.

Upvotes: 3

Views: 2658

Answers (3)

dickoa
dickoa

Reputation: 18437

You can use pmax for that

set.seed(123)
dat <- data.frame(matrix(rnorm(15), ncol = 3))



cbind(dat,
      max = pmax(dat$X1, dat$X2, dat$X3)
)

##         X1        X2       X3     max
## 1  0.42646  0.688640 -0.69471 0.68864
## 2 -0.29507  0.553918 -0.20792 0.55392
## 3  0.89513 -0.061912 -1.26540 0.89513
## 4  0.87813 -0.305963  2.16896 2.16896
## 5  0.82158 -0.380471  1.20796 1.20796

Upvotes: 3

liuminzhao
liuminzhao

Reputation: 2455

You statement gets the value for the maximum of all elements. Try to use apply:

R > dat$max <-  apply(dat, 1, max)
R > dat
  setosa versicolor  virginica      max
1      0 0.96969697 0.03030303 0.969697
2      0 0.05128205 0.94871795 0.948718
3      0 0.96969697 0.03030303 0.969697
4      1 0.00000000 0.00000000 1.000000
5      1 0.00000000 0.00000000 1.000000
6      0 0.05128205 0.94871795 0.948718
7      0 0.05128205 0.94871795 0.948718
8      0 0.05128205 0.94871795 0.948718

Upvotes: 3

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 59970

max returns a single value that is the maximum of all the arguments submitted to it. So the max value across all three columns in your data is 1 which is what `max returns:

max(df$setosa,df$versicolor,df$virginica)
[1] 1

You then assign it to a new column in your data.frame, and due to the way R is designed recycling on the assignment occurs such that the value returned from max is reused until the size of the vector it is being assigned to is full, in this case, the number of rows in your data frame.

If you want the max of each column, do

apply( df , 2 , max )
   setosa versicolor  virginica 
 1.000000   0.969697   0.948718 

Which applies the max function to each column and returns the result. If you want to know which row contains the max value for each column use which.max like so

apply( df , 2 , which.max )
 setosa versicolor  virginica 
     4          1          2 

And if you want the max across the values by row, set the MARGIN argument to apply to be 1 (here the MARGIN argument is set using positional matching rather than being named explicitly):

df$max <- apply( df , 1 , max )

Upvotes: 1

Related Questions