Reputation: 704
I have a df :
id sample1_1 sample1_2 sample2_1 sample2_2 sample2_3 sample3_1 sample3_2
honda 4.464274 7.087345 2.659297 83.513596 49.299961 22.991566 19.679316
audi 1.454645 2.784645 2.692656 14.010951 7.674361 3.84253 3.795233
What I would like to do is to calculate
ratio =4.464274/(4.464274+1.454645)*100 for each sample between honda and audi.
for each row and bind it to new df
id sample1_1 sample1_2 sample2_1 sample2_2 sample2_3 sample3_1 sample3_2 ratio_sample1_1...sample3_1
honda 4.464274 7.087345 2.659297 83.513596 49.299961 22.991566 19.679316
audi 1.454645 2.784645 2.692656 14.010951 7.674361 3.84253 3.795233
Is there any easy way to do this?
standard deviation for sample replicates somthing like this but for each sample group
sample1_1_ratio sample1_2_ratio STD
75 71 sd(sample1_1_ratio,sample1_2_ratio)
24 28 sd(sample1_1_ratio,sample1_2_ratio)
Upvotes: 0
Views: 3243
Reputation: 10483
Here is a slightly different solution to get the same results, but organizing the data frame in a more manageable long format:
library(dplyr)
library(tidyr)
df %>%
gather(sample, value, -id) %>%
group_by(sample) %>%
mutate(ratio = value / sum(value) * 100)
# A tibble: 14 x 4
# Groups: sample [7]
id sample value ratio
<fctr> <chr> <dbl> <dbl>
1 honda sample1_1 4.464274 75.42381
2 audi sample1_1 1.454645 24.57619
3 honda sample1_2 7.087345 71.79247
4 audi sample1_2 2.784645 28.20753
5 honda sample2_1 2.659297 49.68835
6 audi sample2_1 2.692656 50.31165
7 honda sample2_2 83.513596 85.63341
8 audi sample2_2 14.010951 14.36659
9 honda sample2_3 49.299961 86.53014
10 audi sample2_3 7.674361 13.46986
11 honda sample3_1 22.991566 85.68042
12 audi sample3_1 3.842530 14.31958
13 honda sample3_2 19.679316 83.83256
14 audi sample3_2 3.795233 16.16744
If you want the standard deviation of the ratios, you can compute it as follows in the same pipe (mutates the value per row):
df %>% gather(sample, value, -id) %>% group_by(sample) %>% mutate(ratio = value / sum(value) * 100, sd_sample = sd(ratio))
If, you don't want values duplicated per row in group, you can run summarise(sdev = sd(ratio))
in a separate pipe.
Upvotes: 2
Reputation: 214957
You can use mutate_if
with is.numeric
to create new columns for all existing numeric ones:
df %>% mutate_if(is.numeric, funs(ratio = 100 * ./sum(.)))
# id sample1_1 sample1_2 sample2_1 sample2_2 sample2_3 sample3_1 sample3_2 sample1_1_ratio sample1_2_ratio sample2_1_ratio sample2_2_ratio sample2_3_ratio sample3_1_ratio sample3_2_ratio
#1 honda 4.464274 7.087345 2.659297 83.51360 49.299961 22.99157 19.679316 75.42381 71.79247 49.68835 85.63341 86.53014 85.68042 83.83256
#2 audi 1.454645 2.784645 2.692656 14.01095 7.674361 3.84253 3.795233 24.57619 28.20753 50.31165 14.36659 13.46986 14.31958 16.16744
Or if the column names have start pattern for instance sample
, you can also use mutate_at
:
df %>% mutate_at(vars(starts_with('sample')), funs(ratio = 100 * ./sum(.)))
Upvotes: 3