Reputation: 25
I have a series of data frames representing separate molecules (F0001, F0002,...,
) that contain hundeds/thousands of scores from experiments using that molecule. Each data frame looks like this.
F0001
PoseID Score
1 AAAA_1 -13.70
2 AAAA_2 -9.21
3 AAAA_3 -7.60
4 AAAA_4 -6.28
5 ....
F0002
PoseID Score
1 AAAB_1 -14.90
2 AAAB_2 -13.92
3 AAAB_3 -13.49
4 AAAB_4 -11.95
5 ....
etc., etc.
Based on a cut-off, I'd like to sub-set the data to throw out any of the poses that fall above said cut-off, so, a simple binary comparison. A slight complicating factor is that the cut-off differs for each of (F0001, F0002,...,
) so I've gone ahead and stored those in a data frame (let's call it cutoffs
.
cutoffs
FragmentID ScoreCutOff
1 F0001 -9.69
2 F0002 -9.33
3 F0003 -8.50
4 ....
So I guess the question becomes, do I perform the comparison between cutoffs
and each data frame or add all the data frames to a list and perform the comparison between cutoffs
and the list of data frames there?
I'm feeling that Ari Friedman's answer is in the ballpark so I'm tooling about with sapply/any to get it working, usually one solves this sort of problem quite easily with nested loops and data structures in Python/CPP/Java but I'm new to doing it in R so I'm keen to hear of any other ideas people have. Of course, if I solve it myself in the interim, will post solution for critique.
Upvotes: 1
Views: 588
Reputation: 4472
Assuming df1, df2 as you dataframes, you could try this using lapply
dflist = list(df1, df2)
names(dflist) = cutoffs$FragmentID
out = lapply(names(dflist),
function(x){
cfval = subset(cutoff, FragmentID %in% x);
subset(dflist[[x]], Score < cfval$ScoreCutOff)
})
names(out) = cutoff$FragmentID
#> out
#$F0001
# PoseID Score
#1 AAAA_1 -13.7
#
#$F0002
# PoseID Score
#1 AAAB_1 -14.90
#2 AAAB_2 -13.92
#3 AAAB_3 -13.49
#4 AAAB_4 -11.95
later if you want to have all the data-frames seperately, you could do this
# data-frames with names F0001, F0002, ....
list2env(out,.GlobalEnv)
Upvotes: 0
Reputation: 7659
Based upon the information you provide, something like that should do the job:
# bring your data.frames into a list:
f <- list( F0001, F0002 )
> f
[[1]]
PoseID Score
1 AAAA_1 -13.70
2 AAAA_2 -9.21
3 AAAA_3 -7.60
4 AAAA_4 -6.28
[[2]]
PoseID Score
1 AAAB_1 -14.90
2 AAAB_2 -13.92
3 AAAB_3 -13.49
4 AAAB_4 -11.95
# subset per list item
for( i in 1 : length( f ) )
f[[ i ]] <- f[[ i ]][ f[[ i ]][ 2 ] < cutoffs[ i, 2 ], ]
> f
[[1]]
PoseID Score
1 AAAA_1 -13.7
[[2]]
PoseID Score
1 AAAB_1 -14.90
2 AAAB_2 -13.92
3 AAAB_3 -13.49
4 AAAB_4 -11.95
Not sure what you mean with "above cut-off", maybe you have to reverse the less-than <
operation. I also assume that in cutoffs
, the data have exactly the same order as in the list of data.frames, otherwise some additional operation to identify the corresponding cut-off may be necessary.
Upvotes: 1