Reputation: 485
I have a table like this one:
data <- data.frame(a = c("0/0", "0/1", "0/0", "0/0" ),
b = c("0/1", "./.", "0/1", "0/0"),
c = c("1/0", "0/0", "1/1", "0/0"),
d = c("1/0", "0/0", "1/1", "0/0"),
f = c("L", "L", "T", "L"))
I would like to select any row that contains at least one 0/1 or 1/0 and no ./. in the columns a, b and c, and that match L in the column f.
I am trying this using the library data.table
data[data$a %like% "0/1|1/0" | data$b %like% "0/1|1/0"| data$c %like% "0/1|1/0" & !(data$a %like% "./.") & !(data$b %like% "./.") & !(data$c %like% "./.") & data$f == "L", ]
But it is not working.
The table looks like this:
a b c d f
1 0/0 0/1 1/0 1/0 L
2 0/1 ./. 0/0 0/0 L
3 0/0 0/1 1/1 1/1 T
4 0/0 0/0 0/0 0/0 L
And the desired output should look like this:
a b c d f
1 0/0 0/1 1/0 1/0 L
Do you know how I could achive this?
Upvotes: 1
Views: 92
Reputation: 160407
data[ apply(sapply(data[1:4], `%in%`, c('0/1','1/0')), 1, any) &
apply(sapply(data[1:3], Negate(`%in%`), c('./.')), 1, all) &
data$f == "L", ]
# a b c d f
# 1 0/0 0/1 1/0 1/0 L
Broken down:
sapply(data[1:4], `%in%`, c('0/1','1/0'))
# a b c d
# [1,] FALSE TRUE TRUE TRUE
# [2,] TRUE FALSE FALSE FALSE
# [3,] FALSE TRUE FALSE FALSE
# [4,] FALSE FALSE FALSE FALSE
That gives us the instances in the first four columns with one of the two "wanted" patterns. We want rows where any of the columns have it, so we "any" across them:
apply(sapply(data[1:4], `%in%`, c('0/1','1/0')), 1, any)
# [1] TRUE TRUE TRUE FALSE
Similarly, find those with the "not wanted" patterns:
sapply(data[1:3], Negate(`%in%`), c('./.'))
# a b c
# [1,] TRUE TRUE TRUE
# [2,] TRUE FALSE TRUE
# [3,] TRUE TRUE TRUE
# [4,] TRUE TRUE TRUE
apply(sapply(data[1:3], Negate(`%in%`), c('./.')), 1, all) # notice "all", not "any"
# [1] TRUE FALSE TRUE TRUE
Now we want "L" in the last column (=="L"
), straight-forward to chain them logically with &
.
Upvotes: 2
Reputation: 1562
Similar to the previous answer:
apply(data[, 1:4], 1, function(a) any(a %in% c("0/1","1/0")) && !any(a[1:3] == "./.")) & data$f == "L"
Upvotes: 1
Reputation: 513
You can first concatenate the columns a
, b
, c
, and d
together.
data[, abcd := paste(a, b, c, d)]
Then, I would create another new column which will tell me whether the conditions are met. I first set it to all FALSE
.
data[, Selection := F]
Next, only for the columns that meet the conditions are being assigned TRUE
.
(1) having "0/1"
or "1/0"
in a
, b
, c
, or d
, or simply abcd
(2) not having "./."
in abcd
(3) having "L"
in column f
data[(grepl("0/1", abcd) | grepl("1/0", abcd)) & !grepl("\\./\\.", abcd) &
f == "L",
Selection := T]
From here, I can select the row(s) needed by
data[(Selection), ]
Upvotes: 0
Reputation: 11140
Here's a simple solution using apply
from base R -
test <- apply(data, 1, function(x) {
any(x %in% c("0/1", "1/0")) & !any(x == "./.") & x["f"] == "L"
})
data[test, ]
# a b c d f
# 1 0/0 0/1 1/0 1/0 L
Upvotes: 0
Reputation: 25225
Another option:
data[f=="L", .SD[apply((.SD=="1/0" | .SD=="0/1") & !apply(.SD=="./.",1,any), 1, any)],
.SDcols=c("a","b","c","d")]
Upvotes: 1
Reputation: 3194
This is by no means faster than a working data.table solution but this works with base R:
dat <- data.frame(a = c("0/0", "0/1", "0/0", "0/0" ),
b = c("0/1", "./.", "0/1", "0/0"),
c = c("1/0", "0/0", "1/1", "0/0"),
d = c("1/0", "0/0", "1/1", "0/0"),
f = c("L", "L", "T", "L"))
dat
f <- which(colnames(dat) == 'f')
rows <- apply(dat, 1, function(x) x[f] == "L" & !any("./." == x[-f]) & any("0/1" == x[-f]) | any("1/0" == x[-f]) )
dat[rows,]
It uses the apply function to apply a function by row on the data.
Upvotes: 1