Bhail
Bhail

Reputation: 407

How best to index for max values in data frame?

Here dataset in use is genotype from the cran package,MASS.

> names(genotype)
[1] "Litter" "Mother" "Wt"

> str(genotype)
'data.frame':   61 obs. of  3 variables:
 $ Litter: Factor w/ 4 levels "A","B","I","J": 1 1 1 1 1 1 1 1 1 1 ...
 $ Mother: Factor w/ 4 levels "A","B","I","J": 1 1 1 1 1 2 2 2 3 3 ...
 $ Wt    : num  61.5 68.2 64 65 59.7 55 42 60.2 52.5 61.8 ...

This was the given question from a tutorial: Exercise 6.7. Find the heaviest rats born to each mother in the genotype() data.

tapply, whence split by factor genotype$Mother gives:

> tapply(genotype$Wt, genotype$Mother, max)
   A    B    I    J 
68.2 69.8 61.8 61.0 

Also:

> out <- tapply(genotype$Wt, genotype[,1:2],max)
> out
      Mother
Litter    A    B    I    J
     A 68.2 60.2 61.8 61.0
     B 60.3 64.7 59.0 51.3
     I 68.0 69.8 61.3 54.5
     J 59.0 59.5 61.4 54.0

First tapply gives the heaviest rats from each mother , and second (out) gives a table that allows me identify which type of litter of each mother was heaviest. Is there another way to match which Litter is has the most weight for each mother, for instance if the 2 dim table is real large.

Upvotes: 0

Views: 82

Answers (2)

akrun
akrun

Reputation: 887501

We could use data.table. We convert the 'data.frame' to 'data.table' (setDT(genotype)). Create the index using which.max and subset the rows of the dataset grouped by the 'Mother'.

library(data.table)#v1.9.5+
setDT(genotype)[, .SD[which.max(Wt)], by = Mother]
#   Mother Litter   Wt
#1:      A      A 68.2
#2:      B      I 69.8
#3:      I      A 61.8
#4:      J      A 61.0

If we are only interested in the max of 'Wt' by 'Mother'

setDT(genotype)[, list(Wt=max(Wt)), by = Mother]
#   Mother   Wt
#1:      A 68.2
#2:      B 69.8
#3:      I 61.8
#4:      J 61.0

Based on the last tapply code showed by the OP, if we need similar output, we can use dcast from the devel version of 'data.table'

dcast(setDT(genotype), Litter ~ Mother, value.var='Wt', max)
#   Litter    A    B    I    J
#1:      A 68.2 60.2 61.8 61.0
#2:      B 60.3 64.7 59.0 51.3
#3:      I 68.0 69.8 61.3 54.5
#4:      J 59.0 59.5 61.4 54.0

data

library(MASS)
data(genotype)

Upvotes: 3

Robert
Robert

Reputation: 5152

From stats:

aggregate(. ~ Mother, data = genotype, max)

or

aggregate(Wt ~ Mother, data = genotype, max)

Upvotes: 1

Related Questions