Reputation: 905
I have some data that looks like this:
List_name Condition1 Condition2 Situation1 Situation2 List1 0.01 0.12 66 123 List2 0.23 0.22 45 -34 List3 0.32 0.23 13 -12 List4 0.03 0.56 -3 45 List5 0.56 0.05 12 100 List6 0.90 0.09 22 32
I would like to filter each column "Condition" of the data.frame according to a cut off 0.5. After the filter, the subset will occur and will carry the corresponding value of columns "Situation". Filter and subset will work pairwise: "Condition1" with "Situation1", "Condition2" with "Situation2" and so on.
Just the desired output:
List_name Condition1 Situation1 List_name Condition2 Situation2 List1 0.01 66 List1 0.12 123 List2 0.23 45 List2 0.22 -34 List3 0.32 13 List3 0.23 -12 List4 0.03 -3 List5 0.05 100 List6 0.09 32
I'm pretty sure that there's probably another similar situation posted before but I searched and I didn't find it.
Upvotes: 1
Views: 165
Reputation: 121568
Similar to excellent @Arun solution, but based on columns names and without any assumption.
cols.conds <- colnames(dat)[gregexpr(pattern='Condition[0-9]+',colnames(dat)) > 0]
lapply(cols.conds, function(x){
col.list <- colnames(dat)[1]
col.situ <- gsub('Condition','Situation',x)
dat[which(dat[[x]] < 0.5), c(col.list,x,col.situ)]}
)
I assume dat is :
dat <- read.table(text =' List_name Condition1 Condition2 Situation1 Situation2
List1 0.01 0.12 66 123
List2 0.23 0.22 45 -34
List3 0.32 0.23 13 -12
List4 0.03 0.56 -3 45
List5 0.56 0.05 12 100
List6 0.90 0.02 22 32',head=T)
Upvotes: 2
Reputation: 118799
I think what you're asking for is attainable, but it can't be bind
(bound) in the way you've shown as they have unequal elements. So, you'll get a list.
Here, I assume that your data.frame
always is of the form List_name
, followed by a list of Condition1
, ... ,ConditionN
and then Situation1
, ..., SituationN
.
Then, this can be obtained by getting the ids
first and then filtering using lapply
ids <- grep("Condition", names(df))
lapply(ids, function(x) df[which(df[[x]] < 0.5), c(1,x,x+length(ids))])
# [[1]]
# List_name Condition1 Situation1
# 1 List1 0.01 66
# 2 List2 0.23 45
# 3 List3 0.32 13
# 4 List4 0.03 -3
#
# [[2]]
# List_name Condition2 Situation2
# 1 List1 0.12 123
# 2 List2 0.22 -34
# 3 List3 0.23 -12
# 5 List5 0.05 100
# 6 List6 0.09 32
Upvotes: 1
Reputation: 43255
You can use the notion that boolean checks are vectorized:
x <- c(0.1, 0.3, 0.5, 0.2)
x < 0.5
# [1] TRUE TRUE FALSE TRUE
And some grep
results:
grep('Condition', names(DF1))
To do this subsetting you can use apply
to generate your boolean vector:
keepers <- apply(DF1[, grep('Condition', names(DF1))], 1, function(x) any(x < 0.5))
And subset:
DF1[keepers,]
Notice that this doesn't necessarily return the data structure you showed in your question. But you can alter the anonymous function accordingly using all
or a different threshold value.
In lieu of the edits, I would approach this differently. I would use melt
from the reshape2
package:
library(reshape2)
dat.c <- melt(DF1,
id.var='List_name',
measure.var=grep('Condition', names(DF1), value=TRUE),
variable.name='condition',
value.name='cond.val')
dat.c$idx <- gsub('Condition', '', dat.c$condition)
dat.s <- melt(DF1,
id.var='List_name',
measure.var=grep('Situation', names(DF1), value=TRUE),
variable.name='situation',
value.name='situ.val')
dat.s$idx <- gsub('Situation', '', dat.s$situation)
dat <- merge(dat.c, dat.s)
out <- dat[dat$cond.val < 0.5,]
List_name idx condition cond.val situation situ.val
1 List1 1 Condition1 0.01 Situation1 66
2 List1 2 Condition2 0.12 Situation2 123
3 List2 1 Condition1 0.23 Situation1 45
4 List2 2 Condition2 0.22 Situation2 -34
5 List3 1 Condition1 0.32 Situation1 13
6 List3 2 Condition2 0.23 Situation2 -12
7 List4 1 Condition1 0.03 Situation1 -3
10 List5 2 Condition2 0.05 Situation2 100
12 List6 2 Condition2 0.09 Situation2 32
You can then use dcast
to put the data back in the initial format if you want, but I find data in this "long" form much easier to work with. This form is also pleasant since it avoids the need for NA values where you have rows where one condition is met and others are not.
out.c <- dcast(out, List_name ~ condition, value.var='cond.val')
out.s <- dcast(out, List_name ~ situation, value.var='situ.val')
merge(out.c, out.s)
List_name Condition1 Condition2 Situation1 Situation2
1 List1 0.01 0.12 66 123
2 List2 0.23 0.22 45 -34
3 List3 0.32 0.23 13 -12
4 List4 0.03 NA -3 NA
5 List5 NA 0.05 NA 100
6 List6 NA 0.09 NA 32
Upvotes: 1