using dplyr calculate row ratio

I have a df :

id  sample1_1   sample1_2   sample2_1   sample2_2   sample2_3   sample3_1   sample3_2
honda   4.464274    7.087345    2.659297    83.513596   49.299961   22.991566   19.679316
audi    1.454645    2.784645    2.692656    14.010951   7.674361    3.84253 3.795233

What I would like to do is to calculate

ratio =4.464274/(4.464274+1.454645)*100 for each sample between honda and audi.

for each row and bind it to new df

desired output

id  sample1_1   sample1_2   sample2_1   sample2_2   sample2_3   sample3_1   sample3_2 ratio_sample1_1...sample3_1
    honda   4.464274    7.087345    2.659297    83.513596   49.299961   22.991566   19.679316
    audi    1.454645    2.784645    2.692656    14.010951   7.674361    3.84253 3.795233 

Is there any easy way to do this?

EDIT

standard deviation for sample replicates somthing like this but for each sample group

sample1_1_ratio     sample1_2_ratio     STD
75  71  sd(sample1_1_ratio,sample1_2_ratio) 
24  28  sd(sample1_1_ratio,sample1_2_ratio)

Upvotes: 0

Views: 3243

Answers (2)

Gopala
Gopala

Reputation: 10483

Here is a slightly different solution to get the same results, but organizing the data frame in a more manageable long format:

library(dplyr)
library(tidyr)
df %>%
  gather(sample, value, -id) %>%
  group_by(sample) %>%
  mutate(ratio = value / sum(value) * 100)
# A tibble: 14 x 4
# Groups:   sample [7]
       id    sample     value    ratio
   <fctr>     <chr>     <dbl>    <dbl>
 1  honda sample1_1  4.464274 75.42381
 2   audi sample1_1  1.454645 24.57619
 3  honda sample1_2  7.087345 71.79247
 4   audi sample1_2  2.784645 28.20753
 5  honda sample2_1  2.659297 49.68835
 6   audi sample2_1  2.692656 50.31165
 7  honda sample2_2 83.513596 85.63341
 8   audi sample2_2 14.010951 14.36659
 9  honda sample2_3 49.299961 86.53014
10   audi sample2_3  7.674361 13.46986
11  honda sample3_1 22.991566 85.68042
12   audi sample3_1  3.842530 14.31958
13  honda sample3_2 19.679316 83.83256
14   audi sample3_2  3.795233 16.16744

If you want the standard deviation of the ratios, you can compute it as follows in the same pipe (mutates the value per row):

df %>% gather(sample, value, -id) %>% group_by(sample) %>% mutate(ratio = value / sum(value) * 100, sd_sample = sd(ratio))

If, you don't want values duplicated per row in group, you can run summarise(sdev = sd(ratio)) in a separate pipe.

Upvotes: 2

akuiper
akuiper

Reputation: 214957

You can use mutate_if with is.numeric to create new columns for all existing numeric ones:

df %>% mutate_if(is.numeric, funs(ratio = 100 * ./sum(.)))

#     id sample1_1 sample1_2 sample2_1 sample2_2 sample2_3 sample3_1 sample3_2 sample1_1_ratio sample1_2_ratio sample2_1_ratio sample2_2_ratio sample2_3_ratio sample3_1_ratio sample3_2_ratio
#1 honda  4.464274  7.087345  2.659297  83.51360 49.299961  22.99157 19.679316        75.42381        71.79247        49.68835        85.63341        86.53014        85.68042        83.83256
#2  audi  1.454645  2.784645  2.692656  14.01095  7.674361   3.84253  3.795233        24.57619        28.20753        50.31165        14.36659        13.46986        14.31958        16.16744

Or if the column names have start pattern for instance sample, you can also use mutate_at:

df %>% mutate_at(vars(starts_with('sample')), funs(ratio = 100 * ./sum(.)))

Upvotes: 3

Related Questions