ibm
ibm

Reputation: 874

filter across columns using row predicate to keep only negative cell values

I'm interested in reducing a very large correlation matrix to keep only the cells and rows that have negative associations. I have already reduced it somewhat using df%>%filter_all(any_vars(.<0)) This is an example of the subset I get afterwards. How can I select the columns on the basis of their cell content without having to select by name: those that have any negative value (D and E)? I am partial to a tidyverse answer if possible but I'll take what I can get. I thought some sort of across() + if_else() since I don't mind turning all non-negative numbers into NA but I couldn't figure it out.

ex <- tribble(~A, ~B, ~C, ~D, ~E,
       "L", 0.133, 0.446, -0.0190, NA,
        "M", 0.166, 0.136,  0.0893, 0.0755,
        "N", 0.110, 0.159,  0.0872, -0.186,
        "O", 0.0161, NA, 0.0272, -0.0767,
      "P",  0.147, 0.0864, 0.0417, -0.0629)

Upvotes: 1

Views: 189

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389325

Base R option -

#Keep only the columns that have at least one negative value
result <- Filter(function(x) is.numeric(x) && any(x < 0, na.rm = TRUE), ex)
#Turn the values greater than 0 to NA
result[result > 0] <- NA
result

#     D       E
#   <dbl>   <dbl>
#1 -0.019 NA     
#2 NA     NA     
#3 NA     -0.186 
#4 NA     -0.0767
#5 NA     -0.0629

Upvotes: 0

akrun
akrun

Reputation: 887971

We can use select with where. Create two conditions in a short-circuit (&&) to match the column type as numeric and there are any non-NA values less than 0

library(dplyr)
ex %>%
      dplyr::select(where(~ is.numeric(.) && any(.[complete.cases(.)] < 0)))

-output

# A tibble: 5 x 2
        D       E
    <dbl>   <dbl>
1 -0.019  NA     
2  0.0893  0.0755
3  0.0872 -0.186 
4  0.0272 -0.0767
5  0.0417 -0.0629

If we want to keep any rows that are less than 0 from the columns selected

library(purrr)
ex %>%
       dplyr::select(where(~ is.numeric(.) && any(.[complete.cases(.)] < 0))) %>%
       filter(if_any(everything(), ~ . < 0))
# A tibble: 4 x 2
        D       E
    <dbl>   <dbl>
1 -0.019  NA     
2  0.0872 -0.186 
3  0.0272 -0.0767
4  0.0417 -0.0629

If we want to keep the other column types as well

ex %>%
     dplyr::select(c(where(negate(is.numeric)),
         where(~ is.numeric(.) && any(.[complete.cases(.)] < 0)))) %>% 
     filter(if_any(where(is.numeric), ~ . < 0))
# A tibble: 4 x 3
  A           D       E
  <chr>   <dbl>   <dbl>
1 L     -0.019  NA     
2 N      0.0872 -0.186 
3 O      0.0272 -0.0767
4 P      0.0417 -0.0629

Upvotes: 1

Related Questions