r - subset multiple data frames (or one list) based on cut-offs from a reference data frame

Question

I have a series of data frames representing separate molecules (F0001, F0002,...,) that contain hundeds/thousands of scores from experiments using that molecule. Each data frame looks like this.

F0001

    PoseID  Score
1   AAAA_1  -13.70
2   AAAA_2  -9.21
3   AAAA_3  -7.60
4   AAAA_4  -6.28
5   ....

F0002

    PoseID  Score
1   AAAB_1  -14.90
2   AAAB_2  -13.92
3   AAAB_3  -13.49
4   AAAB_4  -11.95
5   ....

etc., etc.

Based on a cut-off, I'd like to sub-set the data to throw out any of the poses that fall above said cut-off, so, a simple binary comparison. A slight complicating factor is that the cut-off differs for each of (F0001, F0002,...,) so I've gone ahead and stored those in a data frame (let's call it cutoffs.

cutoffs

     FragmentID     ScoreCutOff
1    F0001          -9.69
2    F0002          -9.33
3    F0003          -8.50
4    ....

So I guess the question becomes, do I perform the comparison between cutoffs and each data frame or add all the data frames to a list and perform the comparison between cutoffs and the list of data frames there?

I'm feeling that Ari Friedman's answer is in the ballpark so I'm tooling about with sapply/any to get it working, usually one solves this sort of problem quite easily with nested loops and data structures in Python/CPP/Java but I'm new to doing it in R so I'm keen to hear of any other ideas people have. Of course, if I solve it myself in the interim, will post solution for critique.

Veerendra Gadekar · Accepted Answer

Assuming df1, df2 as you dataframes, you could try this using lapply

dflist = list(df1, df2)
names(dflist) = cutoffs$FragmentID

out = lapply(names(dflist), 
      function(x){ 
        cfval = subset(cutoff, FragmentID %in% x); 
        subset(dflist[[x]], Score < cfval$ScoreCutOff)
      })

names(out) = cutoff$FragmentID

#> out
#$F0001
#  PoseID Score
#1 AAAA_1 -13.7
# 
#$F0002
#  PoseID  Score
#1 AAAB_1 -14.90
#2 AAAB_2 -13.92
#3 AAAB_3 -13.49
#4 AAAB_4 -11.95

later if you want to have all the data-frames seperately, you could do this

# data-frames with names F0001, F0002, ....
list2env(out,.GlobalEnv)

r - subset multiple data frames (or one list) based on cut-offs from a reference data frame

Answers (2)

Related Questions