Reputation: 874
I'm interested in reducing a very large correlation matrix to keep only the cells and rows that have negative associations. I have already reduced it somewhat using df%>%filter_all(any_vars(.<0))
This is an example of the subset I get afterwards.
How can I select the columns on the basis of their cell content without having to select by name: those that have any negative value (D and E)?
I am partial to a tidyverse answer if possible but I'll take what I can get. I thought some sort of across()
+ if_else()
since I don't mind turning all non-negative numbers into NA
but I couldn't figure it out.
ex <- tribble(~A, ~B, ~C, ~D, ~E,
"L", 0.133, 0.446, -0.0190, NA,
"M", 0.166, 0.136, 0.0893, 0.0755,
"N", 0.110, 0.159, 0.0872, -0.186,
"O", 0.0161, NA, 0.0272, -0.0767,
"P", 0.147, 0.0864, 0.0417, -0.0629)
Upvotes: 1
Views: 189
Reputation: 389325
Base R option -
#Keep only the columns that have at least one negative value
result <- Filter(function(x) is.numeric(x) && any(x < 0, na.rm = TRUE), ex)
#Turn the values greater than 0 to NA
result[result > 0] <- NA
result
# D E
# <dbl> <dbl>
#1 -0.019 NA
#2 NA NA
#3 NA -0.186
#4 NA -0.0767
#5 NA -0.0629
Upvotes: 0
Reputation: 887971
We can use select
with where
. Create two conditions in a short-circuit (&&
) to match the column type as numeric
and there are any
non-NA values less than 0
library(dplyr)
ex %>%
dplyr::select(where(~ is.numeric(.) && any(.[complete.cases(.)] < 0)))
-output
# A tibble: 5 x 2
D E
<dbl> <dbl>
1 -0.019 NA
2 0.0893 0.0755
3 0.0872 -0.186
4 0.0272 -0.0767
5 0.0417 -0.0629
If we want to keep any rows that are less than 0 from the columns selected
library(purrr)
ex %>%
dplyr::select(where(~ is.numeric(.) && any(.[complete.cases(.)] < 0))) %>%
filter(if_any(everything(), ~ . < 0))
# A tibble: 4 x 2
D E
<dbl> <dbl>
1 -0.019 NA
2 0.0872 -0.186
3 0.0272 -0.0767
4 0.0417 -0.0629
If we want to keep the other column types as well
ex %>%
dplyr::select(c(where(negate(is.numeric)),
where(~ is.numeric(.) && any(.[complete.cases(.)] < 0)))) %>%
filter(if_any(where(is.numeric), ~ . < 0))
# A tibble: 4 x 3
A D E
<chr> <dbl> <dbl>
1 L -0.019 NA
2 N 0.0872 -0.186
3 O 0.0272 -0.0767
4 P 0.0417 -0.0629
Upvotes: 1