Reputation: 12937

Select the nth value of aggregated column after group by in R

Given df as follows:

  # group value
# 1     A     8
# 2     A     1
# 3     A     7
# 4     B     3
# 5     B     2
# 6     B     6
# 7     C     4
# 8     C     5

df <- structure(list(group = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L), .Label = c("A", "B", "C"), class = "factor"), value = c(8L, 
1L, 7L, 3L, 2L, 6L, 4L, 5L)), .Names = c("group", "value"), class = "data.frame", row.names = c(NA, 
-8L))

And a vector of indices (possibly with NA):

inds <- c(2,1,NA)

How we can get the nth element of column value per group, preferably in base R?

For example, based on inds, we want the second element of value in group A, first element in group B, NA in group C. So the result would be:

#[1] 1 3 NA

Upvotes: 4

Answers (5)

989

Reputation: 12937

I just did come up with another solution:

diag(aggregate(value~group, df, function(x) x[inds])[,-1])
#[1]  1  3 NA

Benchmarking

library(microbenchmark)
library(data.table)
df <- structure(list(group = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L), .Label = c("A", "B", "C"), class = "factor"), value = c(8L, 
1L, 7L, 3L, 2L, 6L, 4L, 5L)), .Names = c("group", "value"), class = "data.frame", row.names = c(NA, 
-8L))
inds <- c(2,1,NA)

f_Imo <- function(df) as.vector(mapply("[", with(df, split(value, group)), inds))
f_Osssan <- function(df) {lvls = levels(df$group);sapply(1:length(lvls),function(x) df$value[df$group==lvls[x]][inds[x]])}
f_User2321 <- function(df)  unlist(mapply(function(x, y) subset(df, group == x, value)[y,] ,levels(df$group), inds))
f_dww <- function(df) setDT(df)[, .SD[inds[.GRP], value], by=group][,V1]
f_m0h3n <- function(df) diag(aggregate(value~group, df, function(x) x[inds])[,-1])

all.equal(f_Imo(df), f_Osssan(df), f_User2321(df), f_dww(df), f_m0h3n(df))
# [1] TRUE

microbenchmark(f_Imo(df), f_Osssan(df), f_m0h3n(df), f_User2321(df), f_dww(df))

# Unit: microseconds
           # expr      min        lq       mean   median        uq      max neval
      # f_Imo(df)   71.004   85.1180   91.52996   91.748   96.8810  121.048   100
   # f_Osssan(df)  252.788  276.5265  318.70529  287.648  301.5495 2651.492   100
    # f_m0h3n(df) 1422.627 1555.4365 1643.47184 1618.740 1670.7095 4729.827   100
 # f_User2321(df) 2889.738 3000.3055 3148.44916 3037.945 3118.7860 6013.442   100
      # f_dww(df) 2960.740 3086.2790 3206.02147 3143.381 3250.9545 5976.229   100

Upvotes: 0

dww

Reputation: 31452

I know you said preferably in base R, but just for the record, here is a data.table way

setDT(df)[, .SD[inds[.GRP], value], by=group][,V1]
#[1]  1  3 NA

Upvotes: 1

User2321

Reputation: 3062

Using again mapply (but not nearly as elegant as IMO's answer):

 mapply(function(x, y) subset(df, group == x, value)[y,] ,levels(df$group), inds)

Upvotes: 1

lmo

Reputation: 38500

Here is a solution with mapply and split:

mapply("[", with(df, split(value, group)), inds)

which returns a named vector

 A  B  C 
 1  3 NA

with(df, split(value, group)) splits the data frame by group and returns a list of data frames. mapply takes that list and "inds" and applies the subsetting function "[" to each pairs of arguments.

Upvotes: 5

Silence Dogood

Reputation: 3597

Using levels and sapply you could do:

DF <- structure(list(group = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L), .Label = c("A", "B", "C"), class = "factor"), value = c(8L, 
1L, 7L, 3L, 2L, 6L, 4L, 5L)), .Names = c("group", "value"), class = "data.frame", row.names = c(NA, 
-8L))


inds <- c(2,1,NA)

lvls = levels(DF$group)

groupInds = sapply(1:length(lvls),function(x) DF$value[DF$group==lvls[x]][inds[x]]  )

groupInds
#[1]  1  3 NA

Upvotes: 2

Select the nth value of aggregated column after group by in R

Answers (5)

Related Questions