Mads Obi
Mads Obi

Reputation: 560

R: calculating fraction of values in column, grouped by value in another column

I have tried to find a solution to this for hours. I have tried to search SO, and should I have overlooked an answer for this, please close this as duplicate.

I have a matrix, sorted by transcript_id, then cond:

transcript_id    cond    expr
A1               B1      40
A1               B2      30
A1               B3      20
A2               B2      35
A2               B3      45
A3               B1      23
A4               B1      64
A4               B3      43

I would like a new column, where the fraction of expr within each transcript_id is listed:

transcript_id    cond    expr   frac
A1               B1      40     0.4444
A1               B2      30     0.3333
A1               B3      20     0.2222
A2               B2      35     0.4375
A2               B3      45     0.5625
A3               B1      23     1
A4               B1      64     0.5981
A4               B3      43     0.4019

Is there a smart way to achieve this?

My naive approach would be to write a function that loops over every unique element in transcript_id, but I am stuck. Note that not every transcript_id is represented by all three cond.

Upvotes: 4

Views: 8544

Answers (2)

dondapati
dondapati

Reputation: 849

For Solving the you'r problem Consider

1.group the your transcript_id column

2.Create the your required column using the two ways dplyr or plyr packages , i wrote the two ways .

using ***dplyr***.  

dataset %>% 
          dplyr::group_by (transcript_id) %>% 
             dplyr::mutate(frac=round(expr/sum(expr),4))




using ***plyr***.

plyr::ddply(dataset,.(transcript_id),plyr::summarise,frac = 
                                                 round(expr/sum(expr),4))

Upvotes: 4

LyzandeR
LyzandeR

Reputation: 37879

One way with data.table:

library(data.table)
#setDT converts to a data.table and then you calculate the fraction of each expr
#grouping by the transcript_id
setDT(df)[, frac := expr / sum(expr), by=transcript_id]

Output:

> df
   transcript_id cond expr      frac
1:            A1   B1   40 0.4444444
2:            A1   B2   30 0.3333333
3:            A1   B3   20 0.2222222
4:            A2   B2   35 0.4375000
5:            A2   B3   45 0.5625000
6:            A3   B1   23 1.0000000
7:            A4   B1   64 0.5981308
8:            A4   B3   43 0.4018692

Upvotes: 4

Related Questions