lukeg
lukeg

Reputation: 1357

subsetting with dplyr or other packages in R

So we have this basic data

A <- c(1,1,1,2,2,2,3,3,3)
B <- c(1,0,0,1,0,0,1,0,0)
C <- c(2,2,3,4,3,3,3,4,3)
Data <- data.frame(A,B,C)

And we now want to filter it such that

We look at each case where C=3 and B=1 and find it A value using this A value, we want to search for C=4 and B=0 and delete it

So graphically,

enter image description here

we want to remove the red row and keep the green one

I ideally want to use dplyr but am open to other alternatives

Upvotes: 2

Views: 179

Answers (5)

Frank
Frank

Reputation: 66819

Here's another data.table approach. First, read it to a keyed data.table:

require(data.table)
DT <- data.table(Data,key="C,B,A")

Identify the A value the OP doesn't like

myA <- DT[J(3,1)]$A

Exclude

DT[!J(4,0,myA)]

This will also work if myA is a vector.

Upvotes: 2

zx8754
zx8754

Reputation: 56004

Base R, I think this is what you need:

# Column A values to consider for filter
C3_B1 <- Data[Data$C==3 & Data$B==1,"A"]

# Filter out rows where C==4 and B==0
Data[ !(Data$A %in% C3_B1 &
          Data$C==4 &
          Data$B==0),]

# Output - row 8 is removed
#   A B C
# 1 1 1 2
# 2 1 0 2
# 3 1 0 3
# 4 2 1 4
# 5 2 0 3
# 6 2 0 3
# 7 3 1 3
# 9 3 0 3

Using sqldf package:

sqldf(c("DELETE FROM Data 
          WHERE A IN (SELECT A FROM Data 
                       WHERE  C = 3 AND B = 1) 
                AND C = 4 
                AND B = 0",
        "SELECT * FROM Data "))

Upvotes: 4

Shorpy
Shorpy

Reputation: 1579

Another solution, probably a bit slower than some of the above. But might be a bit more transparent to the reader. Also easier to throw into a function if you want to run different values of B and C.

a_values <- Data %>% 
  filter(B == 1, C == 3) %>% 
  select(A) %>% 
  unique()

Data <- Data %>% 
  filter(!(A %in% a_values & B == 0 & C == 4))

Edit: forgot to negate!

Upvotes: 0

dimitris_ps
dimitris_ps

Reputation: 5951

With dplyr

Data %>% filter(!(A==select(filter(Data, C==3 & B==1), A)[,1] & C==4 & B ==0))

Its a two step process in essence.

The 1st one is:

select(filter(Data, C==3 & B==1), A)

Filter Data where C==3 and B==1 then select only column A. lets name this temp.

The 2nd step is:

Data %>% filter(!(A==temp[,1] & C==4 & B ==0))

a simple filtering.

Upvotes: 2

DaveTurek
DaveTurek

Reputation: 1297

You mention other alternatives, so here is data.table

library(data.table)
 d=as.data.table(Data)
 d[!((A==d[B==1 & C==3,A])&(C==4&B==0))]

But not sure how this will work on a more complicated case with more rows to delete.

Upvotes: 1

Related Questions