Reputation: 792
I want to remove row with the test "student2". However, I don't want to remove row like "student22", "student 23"... etc. For example:
Student.Code Values
1 canada.student12 2
2 canada.student2 3 # remove
3 canada.student23 5 # keep
4 US.student2 6 # remove
5 US.student32 2
6 Aus.student87 645
7 Turkey.student25 4 #keep
I used the code grepl("student2", example$Student.code, fixed = TRUE
but it also find (remove) the rows with like "student23"
Upvotes: 0
Views: 65
Reputation: 21400
Data:
df <- data.frame(
Student = c("canada.student12", "canada.student2", "canada.student23","US.student2", "US.student32", "Aus.student87", "Turkey.student25"),
Value = c(2,3,5,6,2,654,5)
)
Solution: (in base R)
The idea is to use grepl
to match those values where the number 2
occurs at the word boundary, that is, in regex, at \\b
, and to exclude these strings with the negator !
:
df[!grepl("student2\\b", df$Student),]
Student Value
1 canada.student12 2
3 canada.student23 5
5 US.student32 2
6 Aus.student87 654
7 Turkey.student25 5
Alternatively, you can also go the opposite way and match those patterns that you want to keep:
df[grepl("student(?=\\d{2,})", df$Student, perl = T),]
Here, the idea is to use positive lookahead to match values with student
iff they are followed immediately by at least two digits (\\d{2,}
). (Note that when using lookahead or lookbehind you need to include perl = T
.)
Upvotes: 3
Reputation: 1438
If you have a variable with an exact value you want to remove, don't use grep or grepl.
example <- tibble::tribble(
~Student.Code, ~Values,
"canada.student12", 2L,
"canada.student2", 3L,
"canada.student23", 5L,
"US.student2", 6L,
"US.student32", 2L,
"Aus.student87", 645L,
"Turkey.student25", 4L
)
example <- example[example$Student.Code != "canada.student2",]
# or, in dplyr
example <- filter(example, Student.Code != "canada.student2")
# for multiple values
example <- filter(example, !(Student.Code %in% c("canada.student2", "US.student2")))
fixed = TRUE
is not working because all it means is 'search for this exact string in the input strings', not 'only match this exact string (it must be the whole value)'
Upvotes: 0
Reputation: 3876
We can use grepl("student2$", example$Student.Code)
library(tidyverse)
example <- tibble::tribble(
~Student.Code, ~Values,
"canada.student12", 2L,
"canada.student2", 3L,
"canada.student23", 5L,
"US.student2", 6L,
"US.student32", 2L,
"Aus.student87", 645L,
"Turkey.student25", 4L
)
example$Student.Code
grepl("student2$", example$Student.Code)
[1] FALSE TRUE FALSE TRUE FALSE FALSE FALSE
example %>%
filter(!grepl("student2$", Student.Code))
# A tibble: 5 x 2
Student.Code Values
<chr> <int>
1 canada.student12 2
2 canada.student23 5
3 US.student32 2
4 Aus.student87 645
5 Turkey.student25 4
Upvotes: 4