Reputation: 51

Select rows with at least one (any) negative value

I am looking to create a new data.table that contains all rows with at least one negative value.

Here is a simple reproducible datatable:

dt <- data.table(
  ID = c(42, 43, 44),
  Stage_1 = c(-6, 7, 4),
  Stage_2 = c(-15, 4, -8),
  Stage_3 = c(-20, 2, -5)
)

#    ID Stage_1 Stage_2 Stage_3
# 1: 42      -6     -15     -20 # <~~ row to be selected (> 0 negative values)
# 2: 43       7       4       2
# 3: 44       4      -8      -5 # <~~ row to be selected (> 0 negative values)

My desired output would be:

dt2 <- data.table(
  ID = c(42, 44),
  Stage_1 = c(-6, 4),
  Stage_2 = c(-15, -8),
  Stage_3 = c(-20, -5)
)

#    ID Stage_1 Stage_2 Stage_3
# 1: 42      -6     -15     -20
# 2: 44       4      -8      -5

ID 44 for example, has two negative values but I would like to include all of their rows from the main datatable. Basically all rows with a negative value in any of their columns I would like to add to a new datatable that contains all their information.

The actual dataset I'm working with has ~50 stage columns, so the most efficient solution is what I'm after.

Upvotes: 1

Answers (4)

barboulotte

Reputation: 405

dt <- data.table::data.table(
  ID = c(42, 43, 44),
  Stage_1 = c(-6, 7, 4),
  Stage_2 = c(-15, 4, -8),
  Stage_3 = c(-20, 2, -5)
)

dt
#>    ID Stage_1 Stage_2 Stage_3
#> 1: 42      -6     -15     -20
#> 2: 43       7       4       2
#> 3: 44       4      -8      -5

with apply() function:

dt[apply(dt[, -'ID'], 1, min) < 0, ]
#>    ID Stage_1 Stage_2 Stage_3
#> 1: 42      -6     -15     -20
#> 2: 44       4      -8      -5

with rowMeans() function based on the fact that average of a boolean vector with at least one true value is always greater than zero (thanks to @utubun):

dt[rowMeans(dt[, -'ID'] < 0) > 0, ]
#>    ID Stage_1 Stage_2 Stage_3
#> 1: 42      -6     -15     -20
#> 2: 44       4      -8      -5

and also with the rowMins() function of the fBasics package:

dt[fBasics::rowMins(dt[, -'ID']) < 0, ]
#>    ID Stage_1 Stage_2 Stage_3
#> 1: 42      -6     -15     -20
#> 2: 44       4      -8      -5

# Created on 2021-02-19 by the reprex package (v0.3.0.9001)

Regards,

Upvotes: 5

ThomasIsCoding

Reputation: 102900

A data.table option using sign + rowSums

> dt[, .SD[rowSums(sign(.SD)) < length(.SD)], ID]
   ID Stage_1 Stage_2 Stage_3
1: 42      -6     -15     -20
2: 44       4      -8      -5

Upvotes: 0

utubun

Reputation: 4505

Don't know if it is really good style, but as a shorthand you can use this:

dt[ dt[, any(.SD < 0), by = ID][['V1']], ]

#    ID Stage_1 Stage_2 Stage_3
# 1: 42      -6     -15     -20
# 2: 44       4      -8      -5

¹However, if ID is not unique, it can produce unexpected results.

Upvotes: 1

Drumy

Reputation: 460

dt[Stage_1 < 0 | Stage_2 < 0 | Stage_3 < 0]

#    ID Stage_1 Stage_2 Stage_3
# 1: 42      -6     -15     -20
# 2: 44       4      -8      -5

Edit after OP's clarification:

With many columns:

# Find all the rows with at least one negative in each column
rowsDT <- dt[, lapply(.SD, function(x) which(x < 0)), .SDcols = -'ID']
# Reduce to a vector by applying union
rows <- Reduce(union, rowsDT)
# Extract from the main data.table
dt[rows]

Upvotes: 2

Select rows with at least one (any) negative value

Answers (4)

Related Questions