Math Avengers
Math Avengers

Reputation: 792

Remove row with specific number in R

I want to remove row with the test "student2". However, I don't want to remove row like "student22", "student 23"... etc. For example:

       Student.Code Values
1  canada.student12      2 
2   canada.student2      3 # remove
3  canada.student23      5 # keep
4       US.student2      6 # remove
5     US.student32       2
6    Aus.student87     645
7 Turkey.student25       4 #keep

I used the code grepl("student2", example$Student.code, fixed = TRUE but it also find (remove) the rows with like "student23"

Upvotes: 0

Views: 65

Answers (3)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

Data:

df <- data.frame(
  Student = c("canada.student12", "canada.student2", "canada.student23","US.student2", "US.student32", "Aus.student87", "Turkey.student25"),
  Value = c(2,3,5,6,2,654,5)
)

Solution: (in base R)

The idea is to use grepl to match those values where the number 2 occurs at the word boundary, that is, in regex, at \\b, and to exclude these strings with the negator !:

df[!grepl("student2\\b", df$Student),]
           Student Value
1 canada.student12     2
3 canada.student23     5
5     US.student32     2
6    Aus.student87   654
7 Turkey.student25     5

Alternatively, you can also go the opposite way and match those patterns that you want to keep:

df[grepl("student(?=\\d{2,})", df$Student, perl = T),]

Here, the idea is to use positive lookahead to match values with student iff they are followed immediately by at least two digits (\\d{2,}). (Note that when using lookahead or lookbehind you need to include perl = T.)

Upvotes: 3

smingerson
smingerson

Reputation: 1438

If you have a variable with an exact value you want to remove, don't use grep or grepl.

example <- tibble::tribble(
             ~Student.Code, ~Values,
        "canada.student12",      2L,
         "canada.student2",      3L,
        "canada.student23",      5L,
             "US.student2",      6L,
            "US.student32",      2L,
           "Aus.student87",    645L,
        "Turkey.student25",      4L
        )

example <- example[example$Student.Code != "canada.student2",]
# or, in dplyr
example <- filter(example, Student.Code != "canada.student2")
# for multiple values
example <- filter(example, !(Student.Code %in% c("canada.student2", "US.student2")))

fixed = TRUE is not working because all it means is 'search for this exact string in the input strings', not 'only match this exact string (it must be the whole value)'

Upvotes: 0

Ahorn
Ahorn

Reputation: 3876

We can use grepl("student2$", example$Student.Code)

library(tidyverse)
example <- tibble::tribble(
             ~Student.Code, ~Values,
        "canada.student12",      2L,
         "canada.student2",      3L,
        "canada.student23",      5L,
             "US.student2",      6L,
            "US.student32",      2L,
           "Aus.student87",    645L,
        "Turkey.student25",      4L
        )

example$Student.Code
grepl("student2$", example$Student.Code)
[1] FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE

example %>% 
  filter(!grepl("student2$", Student.Code))

# A tibble: 5 x 2
  Student.Code     Values
  <chr>             <int>
1 canada.student12      2
2 canada.student23      5
3 US.student32          2
4 Aus.student87       645
5 Turkey.student25      4

Upvotes: 4

Related Questions