Reputation: 93
I have a data frame with two numerical variables fatcontent and saltcontent plus two factor variables cond and spice that describe the different treatments. In this data frame each measurement for the numerical varibles was taken twice.
a <- data.frame(cond = rep(c("uncooked", "fried", "steamed", "baked", "grilled"),
each = 2, times = 3),
spice = rep(c("none", "chilli", "basil"), each = 10),
fatcontent = c(4, 5, 6828, 7530, 6910, 7132, 5885, 613, 2845, 2867,
25, 18, 2385, 33227, 4233, 4023, 953, 1025, 4465, 5016,
5, 5, 10235, 12545, 5511, 5111, 596, 585, 4012, 3633),
saltcontent = c(2, 5, 4733, 5500, 5724, 15885, 14885, 217, 193, 148,
6, 4, 26738, 24738, 22738, 23738, 267, 256, 1121, 1558,
1, 1, 21738, 20738, 26738, 27738, 195, 202, 129, 131)
)
Now, I wish to nomalise (that means divide in this case) the numerical variables for each spice group by the mean of the uncooked condition.
E.g. for a$spice == "none"
cond spice fatcontent saltcontent
1 uncooked none 4 2
2 uncooked none 5 5
3 fried none 6828 4733
4 fried none 7530 5500
5 steamed none 6910 5724
6 steamed none 7132 15885
7 baked none 5885 14885
8 baked none 613 217
9 grilled none 2845 193
10 grilled none 2867 148
After normalisation:
cond spice fatcontent saltcontent
1 uncooked none 0.8888889 0.5714286
2 uncooked none 1.1111111 1.4285714
3 fried none 1517.3333333 1352.2857143
4 fried none 1673.3333333 1571.4285714
5 steamed none 1535.5555556 1635.4285714
6 steamed none 1584.8888889 4538.5714286
7 baked none 1307.7777778 4252.8571429
8 baked none 136.2222222 62.0000000
9 grilled none 632.2222222 55.1428571
10 grilled none 637.1111111 42.2857143
My questions is how can I do this for all the groups and variables in the data frame? I assume I could use the dplyr package but I am not sure what is the best way. I appreciate any help!
Upvotes: 8
Views: 4015
Reputation: 70256
A succinct way to normalize the data would be to include the "uncooked" condition right in the mean calculation so you don't need to filter, summarise, join and recalculate. Doing this with mutate_each
means you only need to type it once.
group_by(a, spice) %>%
mutate_each(funs(./mean(.[cond == "uncooked"])), -cond)
#Source: local data frame [30 x 4]
#Groups: spice
#
# cond spice fatcontent saltcontent
#1 uncooked none 0.8888889 5.714286e-01
#2 uncooked none 1.1111111 1.428571e+00
#3 fried none 1517.3333333 1.352286e+03
#4 fried none 1673.3333333 1.571429e+03
#5 steamed none 1535.5555556 1.635429e+03
#6 steamed none 1584.8888889 4.538571e+03
#7 baked none 1307.7777778 4.252857e+03
#8 baked none 136.2222222 6.200000e+01
#9 grilled none 632.2222222 5.514286e+01
#10 grilled none 637.1111111 4.228571e+01
# ... etc
Upvotes: 5
Reputation: 23574
I think this is what you are after. You want to find mean for each spice condition using uncooked data points. That is something I have done in my first step. Then, I wanted to add fatmean
and saltmean
in ana
to your data frame, a
. If your data is really huge, this may not be a memory efficient way. But, I used left_join
to merge ana
and a
. I, then, did division in mutate
for each spice condition. Finally, I dropped two columns for tidying up the results using select
.
### Find mean for each spice condition using uncooked data points
ana <- group_by(filter(a, cond == "uncooked"), spice) %>%
summarise(fatmean = mean(fatcontent), saltmean = mean(saltcontent))
# spice fatmean saltmean
#1 basil 5.0 1.0
#2 chilli 21.5 5.0
#3 none 4.5 3.5
left_join(a, ana, by = "spice") %>%
group_by(spice) %>%
mutate(fatcontent = fatcontent / fatmean,
saltcontent = saltcontent / saltmean) %>%
select(-c(fatmean, saltmean))
# A part of the results
# cond spice fatcontent saltcontent
#1 uncooked none 0.8888889 0.5714286
#2 uncooked none 1.1111111 1.4285714
#3 fried none 1517.3333333 1352.2857143
#4 fried none 1673.3333333 1571.4285714
#5 steamed none 1535.5555556 1635.4285714
#6 steamed none 1584.8888889 4538.5714286
#7 baked none 1307.7777778 4252.8571429
#8 baked none 136.2222222 62.0000000
#9 grilled none 632.2222222 55.1428571
#10 grilled none 637.1111111 42.2857143
If you do all things in one piping, it would be something like this:
group_by(filter(a, cond == "uncooked"), spice) %>%
summarise(fatmean = mean(fatcontent), saltmean = mean(saltcontent)) %>%
left_join(a, ., by = "spice") %>% #right_join is possible with the dev dplyr
group_by(spice) %>%
mutate(fatcontent = fatcontent / fatmean,
saltcontent = saltcontent / saltmean) %>%
select(-c(fatmean, saltmean))
Upvotes: 4
Reputation: 38619
All you need to do is group by both the condition and spice, like so:
library(dplyr)
a %>% group_by(spice, cond) %>%
mutate(fat.norm = fatcontent / mean(fatcontent),
salt.norm = saltcontent / mean(saltcontent))
# Source: local data frame [90 x 6]
# Groups: spice, cond
#
# cond spice fatcontent saltcontent fat.norm salt.norm
# 1 uncooked none 4 2 0.8888889 0.57142857
# 2 uncooked none 5 5 1.1111111 1.42857143
# 3 fried none 6828 4733 0.9511074 0.92504642
# 4 fried none 7530 5500 1.0488926 1.07495358
# 5 steamed none 6910 5724 0.9841903 0.52977926
# 6 steamed none 7132 15885 1.0158097 1.47022074
# 7 baked none 5885 14885 1.8113266 1.97126208
# 8 baked none 613 217 0.1886734 0.02873792
# 9 grilled none 2845 193 0.9961485 1.13196481
# 10 grilled none 2867 148 1.0038515 0.86803519
Alternatively, if you don't want to specify each column, you can use mutate_each
or summarise_each
:
group.norm <- function(x) {
x / mean(x)
}
a %>% group_by(spice, cond) %>%
mutate_each(funs(group.norm))
You can exclude columns or specify only specific columns in mutate_each()
as well, like mutate_each(funs(group.norm), -notthisone)
or mutate_each(funs(group.norm), onlythisone)
Upvotes: 2