Eric González
Eric González

Reputation: 485

Select rows based on a combination of matched and not matched columns

I have a table like this one:

 data <- data.frame(a = c("0/0", "0/1", "0/0", "0/0" ),
                    b = c("0/1", "./.", "0/1", "0/0"),
                    c = c("1/0", "0/0", "1/1", "0/0"),
                    d = c("1/0", "0/0", "1/1", "0/0"),                       
                    f = c("L", "L", "T", "L"))

I would like to select any row that contains at least one 0/1 or 1/0 and no ./. in the columns a, b and c, and that match L in the column f.

I am trying this using the library data.table

data[data$a %like% "0/1|1/0" | data$b %like% "0/1|1/0"| data$c %like% "0/1|1/0" & !(data$a %like% "./.") & !(data$b %like% "./.") & !(data$c %like% "./.") & data$f == "L", ]

But it is not working.

The table looks like this:

       a   b   c   d   f
    1 0/0 0/1 1/0 1/0  L
    2 0/1 ./. 0/0 0/0  L
    3 0/0 0/1 1/1 1/1  T
    4 0/0 0/0 0/0 0/0  L

And the desired output should look like this:

       a   b   c   d   f
    1 0/0 0/1 1/0 1/0  L

Do you know how I could achive this?

Upvotes: 1

Views: 92

Answers (6)

r2evans
r2evans

Reputation: 160407

data[ apply(sapply(data[1:4], `%in%`, c('0/1','1/0')), 1, any) &
      apply(sapply(data[1:3], Negate(`%in%`), c('./.')), 1, all) &
      data$f == "L", ]
#     a   b   c   d f
# 1 0/0 0/1 1/0 1/0 L

Broken down:

sapply(data[1:4], `%in%`, c('0/1','1/0'))
#          a     b     c     d
# [1,] FALSE  TRUE  TRUE  TRUE
# [2,]  TRUE FALSE FALSE FALSE
# [3,] FALSE  TRUE FALSE FALSE
# [4,] FALSE FALSE FALSE FALSE

That gives us the instances in the first four columns with one of the two "wanted" patterns. We want rows where any of the columns have it, so we "any" across them:

apply(sapply(data[1:4], `%in%`, c('0/1','1/0')), 1, any)
# [1]  TRUE  TRUE  TRUE FALSE

Similarly, find those with the "not wanted" patterns:

sapply(data[1:3], Negate(`%in%`), c('./.'))
#         a     b    c
# [1,] TRUE  TRUE TRUE
# [2,] TRUE FALSE TRUE
# [3,] TRUE  TRUE TRUE
# [4,] TRUE  TRUE TRUE
apply(sapply(data[1:3], Negate(`%in%`), c('./.')), 1, all) # notice "all", not "any"
# [1]  TRUE FALSE  TRUE  TRUE

Now we want "L" in the last column (=="L"), straight-forward to chain them logically with &.

Upvotes: 2

Kamil Bartoń
Kamil Bartoń

Reputation: 1562

Similar to the previous answer:

apply(data[, 1:4], 1, function(a) any(a %in% c("0/1","1/0")) && !any(a[1:3] == "./.")) & data$f == "L"

Upvotes: 1

TheN
TheN

Reputation: 513

You can first concatenate the columns a, b, c, and d together.

data[, abcd := paste(a, b, c, d)]

Then, I would create another new column which will tell me whether the conditions are met. I first set it to all FALSE.

data[, Selection := F]

Next, only for the columns that meet the conditions are being assigned TRUE.

(1) having "0/1" or "1/0" in a, b, c, or d, or simply abcd

(2) not having "./." in abcd

(3) having "L" in column f

data[(grepl("0/1", abcd) | grepl("1/0", abcd)) & !grepl("\\./\\.", abcd) &
     f == "L",
     Selection := T]

From here, I can select the row(s) needed by

data[(Selection), ]

Upvotes: 0

Shree
Shree

Reputation: 11140

Here's a simple solution using apply from base R -

test <- apply(data, 1, function(x) {
  any(x %in% c("0/1", "1/0")) & !any(x == "./.") & x["f"] == "L"
})

data[test, ]

#     a   b   c   d f
# 1 0/0 0/1 1/0 1/0 L

Upvotes: 0

chinsoon12
chinsoon12

Reputation: 25225

Another option:

data[f=="L", .SD[apply((.SD=="1/0" | .SD=="0/1") & !apply(.SD=="./.",1,any), 1, any)], 
    .SDcols=c("a","b","c","d")]

Upvotes: 1

Evan Friedland
Evan Friedland

Reputation: 3194

This is by no means faster than a working data.table solution but this works with base R:

dat <- data.frame(a = c("0/0", "0/1", "0/0", "0/0" ),
                   b = c("0/1", "./.", "0/1", "0/0"),
                   c = c("1/0", "0/0", "1/1", "0/0"),
                   d = c("1/0", "0/0", "1/1", "0/0"),                       
                   f = c("L", "L", "T", "L"))

dat
f <- which(colnames(dat) == 'f')
rows <- apply(dat, 1, function(x)  x[f] == "L" & !any("./." == x[-f]) & any("0/1" == x[-f]) | any("1/0" == x[-f]) )
dat[rows,]

It uses the apply function to apply a function by row on the data.

Upvotes: 1

Related Questions