Reputation: 637

Using dplyr to clean data

Take the following example:

set.seed(123456)
A <- 1:500
B <- sample(1:50, 500, replace = T)
C <- rep(0,500)
df1 <- data.frame(A,B,C)
df1$C[1] <- 1

library(dplyr)

Now I want to remove the data where the B values differ more than 10 relative to df1$B[1]

I have tried the following code using the dplyr package:

diff_in_B_less_than_10 <- df1 %>%
  filter(abs( B[C == 1] - B[C == 0]) <= 10)

Upvotes: 0

Answers (3)

Reputation: 13570

You can use between:

df1 %>% filter(between((B-B[1]), -10, 10))  # or
df1 %>% filter((B-B[1]) >= -10 & (B-B[1]) <= 10)

Upvotes: 0

Reputation: 1205

This uses the same ideas and gets you there:

diff_in_B_less_than_10 <- df1 %>% filter(abs(B - df1$B[[1]]) <= 10, C==0)

We just separated out the two concerns: determining the difference and filtering based on C. The two conditions are ANDed together by filter.

Upvotes: 2

Reputation: 7153

Add a column with difference of df$B to df1[1,"B"]

df1$d <- df1$B - df1[1,"B"]

With dplyr, filter to retain any value between -10 and 10; and remove the dummy column created:

df2<-df1 %>% filter(d <= 10 & d >=-10) %>% select(-d)

Upvotes: 1