Reputation: 5169
I have the following data frame:
library(tidyverse)
ndf <- structure(list(experiment_status = c("Negative?", "Negative?",
"Negative", "Negative?", "Negative?", "Negative?"), id = 1:6), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))
ndf
#> # A tibble: 6 x 2
#> experiment_status id
#> <chr> <int>
#> 1 Negative? 1
#> 2 Negative? 2
#> 3 Negative 3
#> 4 Negative? 4
#> 5 Negative? 5
#> 6 Negative? 6
What I want to do is to filter the rows keeping only those without a question mark ?
, i.e. only row 3 is preserved after pipe.
Why did this fail?
ndf %>%
filter(!grepl("[?]", experiment_status))
What's the right way to do it?
Upvotes: 0
Views: 908
Reputation: 47300
To clean your interrogation marks you can use stringi::stri_trans_general
. I'd suggest you use it as early as possible on your data to avoid bad surprises.
library(stringi)
ndf %>%
mutate_at("experiment_status", stri_trans_general, "latin-ascii") %>%
filter(!grepl("[?]", experiment_status)) # or filter(!grepl("\\?$", experiment_status))
# A tibble: 1 x 2
# experiment_status id
# <chr> <int>
# 1 Negative 3
Here no knowledge about the problematic character is needed, and you might clean by the same token other unfortunate punctuation signs or alternate characters.
Upvotes: 1
Reputation: 79208
ndf %>%
filter(!grepl(intToUtf8(65311), experiment_status))
# A tibble: 1 x 2
experiment_status id
<chr> <int>
1 Negative 3
One thing you also notice is if you coerce the tibble to dataframe, it will give you its hex-Unicode which is <U+FF1F>
. You can also use this to filter.
ie:
ndf %>%
filter(!grepl(intToUtf8(0xFF1F), experiment_status))
# A tibble: 1 x 2
experiment_status id
<chr> <int>
1 Negative 3
Upvotes: 2
Reputation: 13125
Probably there is a problem happened during import the csv
file which is written in a non-English OS.
> '?' =='?'
[1] FALSE
ndf %>% filter(!grepl('?',experiment_status))
#Try removing white space but it fails
> trimws(ndf$experiment_status,'both')
[1] "Negative?" "Negative?" "Negative" "Negative?" "Negative?" "Negative?"
#Change '?' to '?' using gsub
> gsub('?', '?', ndf$experiment_status)
[1] "Negative?" "Negative?" "Negative" "Negative?" "Negative?" "Negative?"
ndf %>% mutate(experiment_status_clean = gsub('?', '?', experiment_status))
#Now you are search for a litteral ? so you need to escape ? using \\
ndf %>% mutate(experiment_status_clean = gsub('?', '?', experiment_status)) %>%
filter(!grepl('\\?',experiment_status_clean))
Upvotes: 1
Reputation: 33782
ndf %>%
filter(!grepl("?", experiment_status, fixed = TRUE))
But in your example I think filter(experiment_status == "Negative")
would work too.
EDIT: or since we can have "Positive" too -
ndf %>%
filter(experiment_status %in% c("Negative", "Positive"))
Upvotes: 1