Reputation: 560
I have tried to find a solution to this for hours. I have tried to search SO, and should I have overlooked an answer for this, please close this as duplicate.
I have a matrix, sorted by transcript_id
, then cond
:
transcript_id cond expr
A1 B1 40
A1 B2 30
A1 B3 20
A2 B2 35
A2 B3 45
A3 B1 23
A4 B1 64
A4 B3 43
I would like a new column, where the fraction of expr
within each transcript_id
is listed:
transcript_id cond expr frac
A1 B1 40 0.4444
A1 B2 30 0.3333
A1 B3 20 0.2222
A2 B2 35 0.4375
A2 B3 45 0.5625
A3 B1 23 1
A4 B1 64 0.5981
A4 B3 43 0.4019
Is there a smart way to achieve this?
My naive approach would be to write a function that loops over every unique element in transcript_id
, but I am stuck.
Note that not every transcript_id
is represented by all three cond
.
Upvotes: 4
Views: 8544
Reputation: 849
For Solving the you'r problem Consider
1.group the your transcript_id column
2.Create the your required column using the two ways dplyr or plyr packages , i wrote the two ways .
using ***dplyr***.
dataset %>%
dplyr::group_by (transcript_id) %>%
dplyr::mutate(frac=round(expr/sum(expr),4))
using ***plyr***.
plyr::ddply(dataset,.(transcript_id),plyr::summarise,frac =
round(expr/sum(expr),4))
Upvotes: 4
Reputation: 37879
One way with data.table
:
library(data.table)
#setDT converts to a data.table and then you calculate the fraction of each expr
#grouping by the transcript_id
setDT(df)[, frac := expr / sum(expr), by=transcript_id]
Output:
> df
transcript_id cond expr frac
1: A1 B1 40 0.4444444
2: A1 B2 30 0.3333333
3: A1 B3 20 0.2222222
4: A2 B2 35 0.4375000
5: A2 B3 45 0.5625000
6: A3 B1 23 1.0000000
7: A4 B1 64 0.5981308
8: A4 B3 43 0.4018692
Upvotes: 4