Reputation: 9018
I have this "d" data frame that has 2 groups. In real life I have 20 groups.
d= data.frame(group = c(rep("A",10),rep("B",10),"A"), value = c(seq(1,10,1),seq(101,110,1),10000))
d
group value
1 A 1
2 A 2
3 A 3
4 A 4
5 A 5
6 A 6
7 A 7
8 A 8
9 A 9
10 A 10
11 B 101
12 B 102
13 B 103
14 B 104
15 B 105
16 B 106
17 B 107
18 B 108
19 B 109
20 B 110
21 A 10000
I'd like to add 2 columns, "Upper" and "Lower" that are calculated at the GROUP below level. Since there are only 2 groups I can add the columns manually like this:
d= data.frame(group = c(rep("A",10),rep("B",10),"A"), value = c(seq(1,10,1),seq(101,110,1),10000))
d
d$upper = ifelse(d$group=="A", quantile(d$value[d$group=="A"])[4]+ 2.5*IQR(d$value[d$group=="A"]), quantile(d$value[d$group=="B"])[4]+ 2.5*IQR(d$value[d$group=="B"]) )
d$lower = ifelse(d$group=="A", quantile(d$value[d$group=="A"])[4]- 2.5*IQR(d$value[d$group=="A"]), quantile(d$value[d$group=="B"])[4]- 2.5*IQR(d$value[d$group=="B"]) )
group value upper lower
1 A 1 21 -4.0
2 A 2 21 -4.0
3 A 3 21 -4.0
4 A 4 21 -4.0
5 A 5 21 -4.0
6 A 6 21 -4.0
7 A 7 21 -4.0
8 A 8 21 -4.0
9 A 9 21 -4.0
10 A 10 21 -4.0
11 B 101 119 96.5
12 B 102 119 96.5
13 B 103 119 96.5
14 B 104 119 96.5
15 B 105 119 96.5
16 B 106 119 96.5
17 B 107 119 96.5
18 B 108 119 96.5
19 B 109 119 96.5
20 B 110 119 96.5
21 A 10000 21 -4.0
But when I have 20 or 30 columns whats the best way to add these columns without doing a loop?
Upvotes: 1
Views: 110
Reputation: 42090
Groupwise operations can easily be done using dplyr
's group_by
function:
library(dplyr)
d <- data.frame(group = c(rep("A",10),rep("B",10),"A"), value = c(seq(1,10,1),seq(101,110,1),10000))
d %>%
group_by(group) %>%
mutate(upper=quantile(value, 0.75) + 2.5*IQR(value),
lower=quantile(value, 0.75) - 2.5*IQR(value))
This splits the data frame by the "group" variable and then computes the "upper" and "lower" columns separately for each group.
Upvotes: 1