Fuv8
Fuv8

Reputation: 905

partially match a data.frame and subset all the data.frame

I have some data that looks like this:

 List_name     Condition1   Condition2  Situation1   Situation2
  List1          0.01         0.12         66           123
  List2          0.23         0.22         45           -34
  List3          0.32         0.23         13           -12
  List4          0.03         0.56         -3            45
  List5          0.56         0.05         12           100
  List6          0.90         0.09         22            32

I would like to filter each column "Condition" of the data.frame according to a cut off 0.5. After the filter, the subset will occur and will carry the corresponding value of columns "Situation". Filter and subset will work pairwise: "Condition1" with "Situation1", "Condition2" with "Situation2" and so on.

Just the desired output:

 List_name     Condition1     Situation1  List_name  Condition2  Situation2
  List1          0.01             66       List1        0.12        123
  List2          0.23             45       List2        0.22        -34
  List3          0.32             13       List3        0.23        -12
  List4          0.03             -3       List5        0.05        100
                                           List6        0.09         32

I'm pretty sure that there's probably another similar situation posted before but I searched and I didn't find it.

Upvotes: 1

Views: 165

Answers (3)

agstudy
agstudy

Reputation: 121568

Similar to excellent @Arun solution, but based on columns names and without any assumption.

cols.conds <- colnames(dat)[gregexpr(pattern='Condition[0-9]+',colnames(dat)) > 0]
lapply(cols.conds, function(x){
   col.list <- colnames(dat)[1]
   col.situ <- gsub('Condition','Situation',x)
   dat[which(dat[[x]] < 0.5), c(col.list,x,col.situ)]}
)

I assume dat is :

dat <- read.table(text =' List_name     Condition1   Condition2  Situation1   Situation2
  List1          0.01         0.12         66           123
  List2          0.23         0.22         45           -34
  List3          0.32         0.23         13           -12
  List4          0.03         0.56         -3            45
  List5          0.56         0.05         12           100
  List6          0.90         0.02         22            32',head=T)

Upvotes: 2

Arun
Arun

Reputation: 118799

I think what you're asking for is attainable, but it can't be bind(bound) in the way you've shown as they have unequal elements. So, you'll get a list.

Here, I assume that your data.frame always is of the form List_name, followed by a list of Condition1, ... ,ConditionN and then Situation1, ..., SituationN.

Then, this can be obtained by getting the ids first and then filtering using lapply

ids <- grep("Condition", names(df))
lapply(ids, function(x) df[which(df[[x]] < 0.5), c(1,x,x+length(ids))])

# [[1]]
#   List_name Condition1 Situation1
# 1     List1       0.01         66
# 2     List2       0.23         45
# 3     List3       0.32         13
# 4     List4       0.03         -3
# 
# [[2]]
#   List_name Condition2 Situation2
# 1     List1       0.12        123
# 2     List2       0.22        -34
# 3     List3       0.23        -12
# 5     List5       0.05        100
# 6     List6       0.09         32

Upvotes: 1

Justin
Justin

Reputation: 43255

You can use the notion that boolean checks are vectorized:

x <- c(0.1, 0.3, 0.5, 0.2)
x < 0.5
# [1]  TRUE  TRUE FALSE  TRUE

And some grep results:

grep('Condition', names(DF1))

To do this subsetting you can use apply to generate your boolean vector:

keepers <- apply(DF1[, grep('Condition', names(DF1))], 1, function(x) any(x < 0.5))

And subset:

DF1[keepers,]

Notice that this doesn't necessarily return the data structure you showed in your question. But you can alter the anonymous function accordingly using all or a different threshold value.


In lieu of the edits, I would approach this differently. I would use melt from the reshape2 package:

library(reshape2)
dat.c <- melt(DF1, 
              id.var='List_name', 
              measure.var=grep('Condition', names(DF1), value=TRUE),
              variable.name='condition',
              value.name='cond.val')
dat.c$idx <- gsub('Condition', '', dat.c$condition)
dat.s <- melt(DF1, 
              id.var='List_name', 
              measure.var=grep('Situation', names(DF1), value=TRUE),
              variable.name='situation',
              value.name='situ.val')
dat.s$idx <- gsub('Situation', '', dat.s$situation)
dat <- merge(dat.c, dat.s)

out <- dat[dat$cond.val < 0.5,]

   List_name idx  condition cond.val  situation situ.val
1      List1   1 Condition1     0.01 Situation1       66
2      List1   2 Condition2     0.12 Situation2      123
3      List2   1 Condition1     0.23 Situation1       45
4      List2   2 Condition2     0.22 Situation2      -34
5      List3   1 Condition1     0.32 Situation1       13
6      List3   2 Condition2     0.23 Situation2      -12
7      List4   1 Condition1     0.03 Situation1       -3
10     List5   2 Condition2     0.05 Situation2      100
12     List6   2 Condition2     0.09 Situation2       32

You can then use dcast to put the data back in the initial format if you want, but I find data in this "long" form much easier to work with. This form is also pleasant since it avoids the need for NA values where you have rows where one condition is met and others are not.

out.c <- dcast(out, List_name ~ condition, value.var='cond.val')
out.s <- dcast(out, List_name ~ situation, value.var='situ.val')
merge(out.c, out.s)

  List_name Condition1 Condition2 Situation1 Situation2
1     List1       0.01       0.12         66        123
2     List2       0.23       0.22         45        -34
3     List3       0.32       0.23         13        -12
4     List4       0.03         NA         -3         NA
5     List5         NA       0.05         NA        100
6     List6         NA       0.09         NA         32

Upvotes: 1

Related Questions