EfEx
EfEx

Reputation: 33

Dividing columns by group (Grouping in data frame)

I would like to calculate relative response values by dividing each response/column by its' group mean. I have managed to produce an exhaustive (and thus unsatisfying) method. My data set is very large and contains multiple groups and responses.

###############
# example

# used packages
require(plyr)

# sample data
group <- c(rep("alpha", 3), rep("beta", 3), rep("gamma", 3))
a <- rnorm(9, 10,1) #some random data as response
b <- rnorm(9, 10,1) 
df <- data.frame(group, a, b)

# my approach
# means for each group and response
df.means <- ddply(df, "group", colwise(mean))

# clunky method
df$rel.a[df$group=="alpha"] <- 
    df$a[df$group=="alpha"]/df.means$a[df.means$group=="alpha"]
df$rel.a[df$group=="beta"] <- 
    df$a[df$group=="beta"]/df.means$a[df.means$group=="beta"]
# ... etc
df$rel.b[df$group=="gamma"] <- 
    df$b[df$group=="gamma"]/df.means$b[df.means$group=="gamma"]

#desired outcome (well, perhaps with no missing values)
df
###############

I have been using r for a while now, but I still struggle with trivial data handling procedures. I believe I must be missing something, How can I better address these group(s)?

Upvotes: 3

Views: 2544

Answers (3)

David Arenburg
David Arenburg

Reputation: 92282

With data.table package you can do this whole thing fast and easy in one line (without creating the df.means at all), simply

library(data.table)
setDT(df)[, paste0("real.", names(df)[-1]) := 
            lapply(.SD, function(x) x/mean(x)), 
          group]

This will run over all the column within df (except group) by group and divide each value by the group mean


Edit: If you want to override the original columns (like in the dplyr answer, you can do this with small modification (remove the paste0 part):

setDT(df)[, names(df)[-1] := lapply(.SD, function(x) x/mean(x)), group]

Upvotes: 2

Avraham
Avraham

Reputation: 1719

If i understand you correctly, you can also do this easily in dplyr. Given the above data

library(dplyr)
df %>% group_by(group) %>% mutate(aresp = a/ mean(a), bresp= b/mean(b))

returns:

  group         a         b     aresp     bresp
1 alpha 10.052847  8.076405 1.0132828 0.8288214
2 alpha 10.002243 11.447665 1.0081822 1.1747888
3 alpha  9.708111  9.709265 0.9785350 0.9963898
4  beta 10.732693  7.483065 0.9751125 0.8202278
5  beta 11.719656 11.270522 1.0647824 1.2353754
6  beta 10.567513  8.615878 0.9601051 0.9443968
7 gamma 10.221040 11.181763 1.0035630 0.9723315
8 gamma 10.302611 11.286443 1.0115721 0.9814341
9 gamma 10.030605 12.031643 0.9848649 1.0462344

Upvotes: 1

talat
talat

Reputation: 70256

It's quite easily understandable with the package dplyr, the next version of plyr for data frames:

library(dplyr)
df %>% group_by(group) %>% mutate_each(funs(./mean(.)))

The . represents the data in each column (by group). mutate_each is used to modify each column except the grouping variables. You specify inside the funs argument which functions should be applied to each column.

Upvotes: 4

Related Questions