user3022875
user3022875

Reputation: 9018

correct way to add columns to data frame without loop

I have this "d" data frame that has 2 groups. In real life I have 20 groups.

d= data.frame(group = c(rep("A",10),rep("B",10),"A"), value = c(seq(1,10,1),seq(101,110,1),10000))
d
       group value
    1      A     1
    2      A     2
    3      A     3
    4      A     4
    5      A     5
    6      A     6
    7      A     7
    8      A     8
    9      A     9
    10     A    10
    11     B   101
    12     B   102
    13     B   103
    14     B   104
    15     B   105
    16     B   106
    17     B   107
    18     B   108
    19     B   109
    20     B   110
    21     A 10000

I'd like to add 2 columns, "Upper" and "Lower" that are calculated at the GROUP below level. Since there are only 2 groups I can add the columns manually like this:

d= data.frame(group = c(rep("A",10),rep("B",10),"A"), value = c(seq(1,10,1),seq(101,110,1),10000))
d
d$upper = ifelse(d$group=="A", quantile(d$value[d$group=="A"])[4]+ 2.5*IQR(d$value[d$group=="A"]), quantile(d$value[d$group=="B"])[4]+ 2.5*IQR(d$value[d$group=="B"])   )
d$lower = ifelse(d$group=="A", quantile(d$value[d$group=="A"])[4]- 2.5*IQR(d$value[d$group=="A"]), quantile(d$value[d$group=="B"])[4]- 2.5*IQR(d$value[d$group=="B"])   )


   group value upper lower
1      A     1    21  -4.0
2      A     2    21  -4.0
3      A     3    21  -4.0
4      A     4    21  -4.0
5      A     5    21  -4.0
6      A     6    21  -4.0
7      A     7    21  -4.0
8      A     8    21  -4.0
9      A     9    21  -4.0
10     A    10    21  -4.0
11     B   101   119  96.5
12     B   102   119  96.5
13     B   103   119  96.5
14     B   104   119  96.5
15     B   105   119  96.5
16     B   106   119  96.5
17     B   107   119  96.5
18     B   108   119  96.5
19     B   109   119  96.5
20     B   110   119  96.5
21     A 10000    21  -4.0

But when I have 20 or 30 columns whats the best way to add these columns without doing a loop?

Upvotes: 1

Views: 110

Answers (1)

Ryan C. Thompson
Ryan C. Thompson

Reputation: 42090

Groupwise operations can easily be done using dplyr's group_by function:

library(dplyr)
d <- data.frame(group = c(rep("A",10),rep("B",10),"A"), value = c(seq(1,10,1),seq(101,110,1),10000))
d %>%
    group_by(group) %>%
    mutate(upper=quantile(value, 0.75) + 2.5*IQR(value),
           lower=quantile(value, 0.75) - 2.5*IQR(value))

This splits the data frame by the "group" variable and then computes the "upper" and "lower" columns separately for each group.

Upvotes: 1

Related Questions