Mus
Mus

Reputation: 7530

How can I delete rows if a column contains a certain value?

I have a data frame that has a classification column which contains four values: D1, D2, D8, and RD.

I want to remove all records (rows) where the classification is either D1 or RD.

I have tried this:

df[(df$classification == "D1" | df$classification == "RD"), ] <- NULL

And:

df[df$classification == "D1" | df$classification == "RD", ] <- NULL

But receive this error:

Error in x[[jj]][iseq] <- vjj : replacement has length zero

I have done some searching but the only thing I seem to come across relates to there being an attempt to access the first element at position 0, which obviously won't work as R is 1-indexed.

Where am I going wrong?

Sample data frame:

structure(list(id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 
24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 
37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 
50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 
63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 
76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 
89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 
28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 
41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 
54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 
67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 
80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L, 91L, 92L, 
93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 1L, 2L, 3L, 4L, 5L, 
6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 
19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 
32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 
45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 
58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 
71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 
84L, 85L, 86L, 87L, 88L, 89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 
97L, 98L, 99L, 100L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 
24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 
37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 
50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 
63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 
76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 
89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L), 
    classification = c("D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", 
    "RD", "D1", "D2", "D8", "RD", "D1", "D2", "D8", "RD", "D1", 
    "D2", "D8", "RD")), class = "data.frame", row.names = c(NA, 
-400L))

Upvotes: 2

Views: 6524

Answers (3)

jay.sf
jay.sf

Reputation: 72633

Using grep().

dat[-grep('D1|RD', dat$classification), ]

Upvotes: 0

Anoushiravan R
Anoushiravan R

Reputation: 21908

I think you can try this instead:

df <- df[!(df$classification == "D1" | df$classification == "RD"), ]

Or a tidyverse approach would be:

library(dplyr)

df %>%
  filter(if_any(classification, ~ !(.x %in% c("D1", "RD"))))

Or again a base R solution:

with(df, subset(df, !(classification == "RD" | classification == "D1"))) -> df

Upvotes: 1

Russ Hyde
Russ Hyde

Reputation: 2269

It's better to think "how do I create an object in the form I want", than "how do I manipulate this object in place". So you can use the following syntax:

df <- df[!df$classification == "D1" | df$classification == "RD", ]

or, the slightly more easy to maintain:

df <- df[!df$classification %in% c("D1", "RD"), ]

Upvotes: 2

Related Questions