Reputation: 31
I have a 38467*59 data frame. Here is a subset of the whole dataset
mydata.2
Unique.Groups Protein.group.IDs Intensity.R1 Intensity.R2
1 yes 0 9701600 17971000
2 yes 0 0 29308000
3 yes 1 49083000 75300000
4 yes 2 53926000 13926000
5 yes 3 35255000 35750000
6 yes 3 302600000 295410000
7 yes 4 72766000 51914000
8 yes 5 33657000 31027000
9 yes 5 16504000 20689000
10 yes 5 191170000 209230000
11 yes 5 155710000 150280000
12 yes 5 402890000 529300000
13 yes 5 633480000 522180000
14 yes 5 92339000 0
15 yes 6 0 30646000
16 yes 6 67494000 92326000
17 yes 6 20205000 0
18 yes 7 94183000 76848000
19 yes 7 15584000 0
20 yes 7 27247000 38961000
21 yes 7 98433000 390870000
22 yes 8 0 36542000
23 yes 8 83978000 0
24 yes 8 53744000 95173000
25 yes 8 0 53494000
I applied the following piece of code to Intensity.R1 column and I'd like to apply it automatically to all the Intensities column of the data frame. At each round I need to run the same code on Intensity.R1, Intensity.R2, and so on.
data.normalized <- filter(mydata.2, Unique.Groups == "yes") %>%
group_by(Protein.group.IDs) %>%
arrange(desc(Intensity.R1)) %>%
top_n(3, Intensity.R1) %>%
mutate(sum.top.3.R1 = sum(Intensity.R1)) %>%
filter(!duplicated(Protein.group.IDs)) %>%
drop.levels() %>%
arrange(Protein.group.IDs)
Each round should generate a new column (i.e sum.top.3) that I want to bind together in a final table. Here is an example just for the Intensity.R1 column.
data.normalized
Source: local data frame [9 x 5]
Groups: Protein.group.IDs
Unique.Groups Protein.group.IDs Intensity.R1 Intensity.R2 sum.top.3.R1
1 yes 0 9701600 17971000 9701600
2 yes 1 49083000 75300000 49083000
3 yes 2 53926000 13926000 53926000
4 yes 3 302600000 295410000 337855000
5 yes 4 72766000 51914000 72766000
6 yes 5 633480000 522180000 1227540000
7 yes 6 67494000 92326000 87699000
8 yes 7 98433000 390870000 219863000
9 yes 8 83978000 0 137722000
Upvotes: 0
Views: 2813
Reputation: 16121
I've created a simple example with same column names for you to modify.
library(dplyr)
library(gdata)
mydata.2 = data.frame(Unique.Groups = rep("yes",8),
Protein.group.IDs = c(1,1,1,1,2,2,2,2),
Intensity.R1 = c(10,40,20,30,40,50,70,60),
Intensity.R2 = c(70,40,50,60,40,10,20,30))
# function that orders a vector and keeps top 3 values and calculates the sum
ff = function(x) {x[order(-x)][1:3] %>% sum()}
filter(mydata.2, Unique.Groups == "yes") %>%
group_by(Protein.group.IDs) %>%
select(-Unique.Groups) %>% # keep only grouping variable and the ones you need to calculate
summarise_each(funs(ff))
Even in your code if you have used "summarise" instead of "mutate" you wouldn't have to to filter out duplicated ids. I don't know if I'm missing something with the drop.levels here. Also, i don't really think you needed the first "arrange" as top_n will keep the top 3, but not in the right order (which is not a problem here).
Upvotes: 1