AwaitedOne
AwaitedOne

Reputation: 1012

How to extract same rows with positive and negative values

I am trying to get the rows which have some value in one column but positive and negative values in another. Input is the below data frame

 data <- data.frame(X = c(1,3,5,7,7,8,9,10,10,11,11,12,12),
                   Y = sample(36476545:36476557),
                   timepoint = c(0,1,0,-0.31,1,1,1,1,-1,1,1,1,1)
                   )

Output looks something like this

X        Y timepoint
4  7 36476557     -0.31
5  7 36476545      1.00
8 10 36476556      1.00
9 10 36476548     -1.00

I was looking at this link, but not what I am looking for.

Upvotes: 2

Views: 1178

Answers (2)

akrun
akrun

Reputation: 887088

After grouping by 'X', filter those have both negative and positive 'timepoint' by taking the sign of 'timepoint', get the number of distinct elements (n_distinct) is 2 (assuming there is no zero)

library(dplyr)
data %>% 
   group_by(X) %>% 
   filter(n_distinct(sign(timepoint)) == 2)
# A tibble: 4 x 3
# Groups:   X [2]
#      X        Y timepoint
#  <dbl>    <int>     <dbl>
#1     7 36476547     -0.31
#2     7 36476556      1   
#3    10 36476549      1   
#4    10 36476557     -1   

NOTE: 'Y' values are different as the example was created with no set.seed


If there is zero as well

data %>% 
       group_by(X) %>%
       filter(all(c(-1, -1) %in% sign(timepoint)))

Or using base R with ave

data[with(data, ave(sign(timepoint), X, FUN = function(x) length(unique(x))) == 2),]

Or another base R option with table

subset(data, X %in% names(which(rowSums(with(subset(data, 
      timepoint != 0), table(X, sign(timepoint))) > 0) == 2)))

Upvotes: 4

Ronak Shah
Ronak Shah

Reputation: 388982

In base R, we can use ave and select groups where there is at least one timepoint value greater than 0 and one timepoint value less than 0.

data[with(data, ave(timepoint > 0, X, FUN = function(x) any(x) & any(!x))), ]

#   X        Y timepoint
#4  7 36476553     -0.31
#5  7 36476551      1.00
#8 10 36476556      1.00
#9 10 36476554     -1.00

In dplyr this would be

library(dplyr)

data %>%
  group_by(X) %>%
  filter(any(timepoint > 0) & any(timepoint < 0))

Upvotes: 0

Related Questions