Reputation: 180
I have a data frame in R that contains multiple columns. The values in these columns can be negative or positive. As a result, I have rows where all values are positive, rows where all values are negative, and rows with both positive and negative values. I want to extract only those rows which have both positive and negative values, that are not zero.
Let's do this with a dummy dataset:
x <- data.frame("contrast_1" = c(-1.2,1.3,1.4,-1.2,0), "contrast_2" = c(-1.8,2.3,2.4,0.02,-8), "contrast_3" = c(-0.23,-4.5,0.4,-0.24,-1.23))
row.names(x) <- c('gene_1', 'gene_2', 'gene_3', 'gene_4', 'gene_5')
The data frame looks like this:
contrast_1 contrast_2 contrast_3
gene_1 -1.2 -1.80 -0.23
gene_2 1.3 2.30 -4.50
gene_3 1.4 2.40 0.40
gene_4 -1.2 0.02 -0.24
gene_5 0.0 -8.00 -1.23
In this data frame, genes 2 and 4 contain both positive and negative values: these are the rows I want to extract. Gene 5 contains negative values, and a zero value. I do not want gene 5.
I solved this problem with the following code:
library(dplyr)
#select all the rows that only have positive values
x_UP = x %>% filter_at(colnames(x), all_vars(. >= 0))
#select all the rows that only have negative values
x_DOWN = x %>% filter_at(colnames(x), all_vars(. <= 0))
#combine the data frames
removed = rbind(x_UP,x_DOWN)
#remove the rows with only positive or only negative values from data frame x
subset = x [!row.names(x)%in%rownames(removed),]
The output looks like this:
contrast_1 contrast_2 contrast_3
gene_2 1.3 2.30 -4.50
gene_4 -1.2 0.02 -0.24
As you can see, this code works, because it only selected genes 2 and 4. However, I feel I should be able to accomplish this in a more elegant way. Hence my question to you: are there better ways to do this? I am mostly interested in a solution that could immediately select all the rows that have both positive and negative values, instead of first extracting the rows that have only positive or only negative values.
Thanks already!
Upvotes: 4
Views: 3175
Reputation: 887088
An option with sign
with all
. We could use c_across
with filter
after doing a rowwise
library(dplyr)
x %>%
rowwise %>%
filter(all(c(-1, 1) %in% sign(c_across(everything())) )) %>%
ungroup
# A tibble: 2 x 3
# contrast_1 contrast_2 contrast_3
# <dbl> <dbl> <dbl>
#1 1.3 2.3 -4.5
#2 -1.2 0.02 -0.24
Or using base R
subset(x, (rowSums(sign(x) < 0) > 0) & (rowSums(sign(x) > 0) > 0))
# contrast_1 contrast_2 contrast_3
#gene_2 1.3 2.30 -4.50
#gene_4 -1.2 0.02 -0.24
Upvotes: 1