R: select rows in data frame that contain both positive and negative values

Question

I have a data frame in R that contains multiple columns. The values in these columns can be negative or positive. As a result, I have rows where all values are positive, rows where all values are negative, and rows with both positive and negative values. I want to extract only those rows which have both positive and negative values, that are not zero.

Let's do this with a dummy dataset:

x <- data.frame("contrast_1" = c(-1.2,1.3,1.4,-1.2,0), "contrast_2" = c(-1.8,2.3,2.4,0.02,-8), "contrast_3" = c(-0.23,-4.5,0.4,-0.24,-1.23))
row.names(x) <- c('gene_1', 'gene_2', 'gene_3', 'gene_4', 'gene_5')

The data frame looks like this:

       contrast_1 contrast_2 contrast_3
gene_1       -1.2      -1.80      -0.23
gene_2        1.3       2.30      -4.50
gene_3        1.4       2.40       0.40
gene_4       -1.2       0.02      -0.24
gene_5        0.0      -8.00      -1.23

In this data frame, genes 2 and 4 contain both positive and negative values: these are the rows I want to extract. Gene 5 contains negative values, and a zero value. I do not want gene 5.

I solved this problem with the following code:

library(dplyr) 

#select all the rows that only have positive values
x_UP = x %>% filter_at(colnames(x), all_vars(. >= 0))

#select all the rows that only have negative values
x_DOWN = x %>% filter_at(colnames(x), all_vars(. <= 0))

#combine the data frames    
removed = rbind(x_UP,x_DOWN)

#remove the rows with only positive or only negative values from data frame x
subset = x [!row.names(x)%in%rownames(removed),]

The output looks like this:

       contrast_1 contrast_2 contrast_3
gene_2        1.3       2.30      -4.50
gene_4       -1.2       0.02      -0.24

As you can see, this code works, because it only selected genes 2 and 4. However, I feel I should be able to accomplish this in a more elegant way. Hence my question to you: are there better ways to do this? I am mostly interested in a solution that could immediately select all the rows that have both positive and negative values, instead of first extracting the rows that have only positive or only negative values.

Thanks already!

akrun · Accepted Answer

An option with sign with all. We could use c_across with filter after doing a rowwise

library(dplyr)
x %>%
   rowwise %>%
   filter(all(c(-1, 1)  %in% sign(c_across(everything())) )) %>%
   ungroup
# A tibble: 2 x 3
#  contrast_1 contrast_2 contrast_3
#                   
#1        1.3       2.3       -4.5 
#2       -1.2       0.02      -0.24

Or using base R

subset(x,  (rowSums(sign(x) < 0) > 0) & (rowSums(sign(x) > 0) > 0))
#       contrast_1 contrast_2 contrast_3
#gene_2        1.3       2.30      -4.50
#gene_4       -1.2       0.02      -0.24

R: select rows in data frame that contain both positive and negative values

Answers (1)

Related Questions