SilvanD
SilvanD

Reputation: 325

find max of factor and index that max in r

This should be incredibly simple, but I'm not managing to figure it out. I want to get max value per group, which I do as follows.

ddply(dd,~group,summarise,max=max(value))

But in addition to returning the value and group, I want to return value, group, and another column, date, indexed below (obviously doesn't work). How do I do it? Thanks.

ddply(dd,~group,summarise,max=max(value))['date']  

Upvotes: 0

Views: 1227

Answers (2)

akrun
akrun

Reputation: 887421

If we are using data.table (using iris dataset), we convert the data.frame to data.table, grouped by the grouping variable ('Species'), we get the index of the max value of one variable ('Sepal.Length') and use that to subset the columns that are indicated in the .SDcols.

library(data.table)
dt <- as.data.table(iris)
dt[, .SD[which.max(Sepal.Length)]  , by = Species, 
                 .SDcols= c('Sepal.Length', 'Sepal.Width')]

Upvotes: 1

mathematical.coffee
mathematical.coffee

Reputation: 56935

If you are after the date that corresponds to the row(s) with the max values, try subset to get the row of the max along with select to get the columns you're after.

# reproducible example using `iris`

# your original
ddply(iris, ~Species, summarise, max=max(Sepal.Length))
#      Species max
# 1     setosa 5.8
# 2 versicolor 7.0
# 3  virginica 7.9


# now we want to get the Sepal.Width that corresponds to max sepal.length too.
ddply(iris, ~Species, subset, Sepal.Length==max(Sepal.Length),
      select=c('Species', 'Sepal.Length', 'Sepal.Width'))
#      Species Sepal.Length Sepal.Width
# 1     setosa          5.8         4.0
# 2 versicolor          7.0         3.2
# 3  virginica          7.9         3.8

(Or instead of using select in the subset call, just use [, c('columns', 'I', 'want')] after the ddply). If there are multiple rows for the same species that attain the maximum, this will return all of them.

You can use summarise to do it too, just add your date definition in the call, but it's a little less efficient (calculating the max twice):

ddply(iris, ~Species, summarise,
      max=max(Sepal.Length),
      width=Sepal.Width[which.max(Sepal.Length)])

This will only return one row per species, and if there are multiple flowers with the maximum sepal length for their species, only the first is returned (which.max returns the first of the matching indices).

Upvotes: 1

Related Questions