bshelt141
bshelt141

Reputation: 1223

Removing ALL min and max values and then finding the mean in R

I have the following dataset:

wow <- data.frame(a = c(1, 1, 1, 2, 3, 4, 4), b = c(3, 4, 2, 6, 2, 6, 5), c = c(1, 6, 3, 6, 1, 8, 9))
print(wow)
  a b c
1 1 3 1
2 1 4 6
3 1 2 3
4 2 6 6
5 3 2 1
6 4 6 8
7 4 5 9

I need to remove all min and max values from each column, and then calculate the mean of the remaining values so that the result looks like this:

print(result)
    a  b    c
1 2.5  4 5.75

I found a similar question that was already answered (mean from row values in a dataframe excluding min and max values in R), but the big difference is that the person asking that question was only dealing with a single min and max value in each column, while I could have multiple min and max values in a column.

Upvotes: 3

Views: 2885

Answers (2)

MichaelChirico
MichaelChirico

Reputation: 34703

A data.table solution (1.9.5+, but can easily be back-fit) to return a data.frame-like object, which it seems you wanted:

library(data.table)
setDT(wow)[,lapply(.SD,function(x)mean(x[x>min(x)&x<max(x)]))]

or, a la @akrun

setDT(wow)[,lapply(.SD,function(x)mean(x[!x%in%range(x)]))]

You may need an na.rm=T depending on your data; there should also be a way to do this with .GRP but I think it's going to end up being longer than the above.

If you want result to be a vector, use sapply (in which case the data.frame solution is basically identical, and the only advantage of data.table is speed).

Upvotes: 5

akrun
akrun

Reputation: 887128

We could remove the values that are min and max in each column using %in%, and get the mean from the remaining values. This can be done either by summarise_each from dplyr

library(dplyr)
summarise_each(wow,funs(mean(.[!.%in% c(min(.), max(.))])))
#    a b    c
#1 2.5 4 5.75

Or using base R

sapply(wow, function(x) mean(x[!x %in% range(x)]))
#   a    b    c 
#2.50 4.00 5.75 

Upvotes: 5

Related Questions