Reputation: 51
I am looking to create a new data.table that contains all rows with at least one negative value.
Here is a simple reproducible datatable:
dt <- data.table(
ID = c(42, 43, 44),
Stage_1 = c(-6, 7, 4),
Stage_2 = c(-15, 4, -8),
Stage_3 = c(-20, 2, -5)
)
# ID Stage_1 Stage_2 Stage_3
# 1: 42 -6 -15 -20 # <~~ row to be selected (> 0 negative values)
# 2: 43 7 4 2
# 3: 44 4 -8 -5 # <~~ row to be selected (> 0 negative values)
My desired output would be:
dt2 <- data.table(
ID = c(42, 44),
Stage_1 = c(-6, 4),
Stage_2 = c(-15, -8),
Stage_3 = c(-20, -5)
)
# ID Stage_1 Stage_2 Stage_3
# 1: 42 -6 -15 -20
# 2: 44 4 -8 -5
ID 44 for example, has two negative values but I would like to include all of their rows from the main datatable. Basically all rows with a negative value in any of their columns I would like to add to a new datatable that contains all their information.
The actual dataset I'm working with has ~50 stage columns, so the most efficient solution is what I'm after.
Upvotes: 1
Views: 943
Reputation: 405
dt <- data.table::data.table(
ID = c(42, 43, 44),
Stage_1 = c(-6, 7, 4),
Stage_2 = c(-15, 4, -8),
Stage_3 = c(-20, 2, -5)
)
dt
#> ID Stage_1 Stage_2 Stage_3
#> 1: 42 -6 -15 -20
#> 2: 43 7 4 2
#> 3: 44 4 -8 -5
apply()
function:dt[apply(dt[, -'ID'], 1, min) < 0, ]
#> ID Stage_1 Stage_2 Stage_3
#> 1: 42 -6 -15 -20
#> 2: 44 4 -8 -5
rowMeans()
function based on the fact that average of a boolean vector with at least one true value is always greater than zero (thanks to @utubun):dt[rowMeans(dt[, -'ID'] < 0) > 0, ]
#> ID Stage_1 Stage_2 Stage_3
#> 1: 42 -6 -15 -20
#> 2: 44 4 -8 -5
rowMins()
function of the fBasics
package:dt[fBasics::rowMins(dt[, -'ID']) < 0, ]
#> ID Stage_1 Stage_2 Stage_3
#> 1: 42 -6 -15 -20
#> 2: 44 4 -8 -5
# Created on 2021-02-19 by the reprex package (v0.3.0.9001)
(Related to Equivalent to rowMeans() for min())
Regards,
Upvotes: 5
Reputation: 102900
A data.table
option using sign
+ rowSums
> dt[, .SD[rowSums(sign(.SD)) < length(.SD)], ID]
ID Stage_1 Stage_2 Stage_3
1: 42 -6 -15 -20
2: 44 4 -8 -5
Upvotes: 0
Reputation: 4505
Don't know if it is really good style, but as a shorthand you can use this:
dt[ dt[, any(.SD < 0), by = ID][['V1']], ]
# ID Stage_1 Stage_2 Stage_3
# 1: 42 -6 -15 -20
# 2: 44 4 -8 -5
¹However, if ID
is not unique, it can produce unexpected results.
Upvotes: 1
Reputation: 460
dt[Stage_1 < 0 | Stage_2 < 0 | Stage_3 < 0]
# ID Stage_1 Stage_2 Stage_3
# 1: 42 -6 -15 -20
# 2: 44 4 -8 -5
Edit after OP's clarification:
With many columns:
# Find all the rows with at least one negative in each column
rowsDT <- dt[, lapply(.SD, function(x) which(x < 0)), .SDcols = -'ID']
# Reduce to a vector by applying union
rows <- Reduce(union, rowsDT)
# Extract from the main data.table
dt[rows]
Upvotes: 2