Reputation: 8247
I have following dataframe:
Scripts
1:URT : 3456 || 2: ABC: 5677 || NA : 0 : 0 || NA : 0 : 0
The pattern I want replace is '|| NA'
and whatever after that with whitespace. I am using following command in r to do it:
gsub("\\|| NA.*$","",df$Scripts)
But it replaces || in the middle between two scripts which I do not want. Desired output is:
1:URT : 3456 || 2: ABC: 5677
Upvotes: 1
Views: 54
Reputation: 51622
A non-regex approach,
sapply(strsplit(df$Scripts, '||', fixed = TRUE), function(i)
paste(i[!grepl('NA', i)], collapse = '||'))
#[1] "1:URT : 3456 || 2: ABC: 5677 "
You can wrap it in trimws
to get rid of leading/following white spaces
Upvotes: 1
Reputation: 887981
We can use sub
to match zero or more spaces (\\s*
) followed by one or more |
(as it is a metacharacter, we can either escape (\\
) or place it in square brakcets followed by one or more spaces, then NA and the rest of the characters, and replace it with ""
sub("\\s*[|]+\\s+NA\\s+.*", "", df$Scripts)
#[1] "1:URT : 3456 || 2: ABC: 5677"
NOTE: In the OP's code, it is just escaping the first |
and not the second one. Instead it should be
gsub("\\s*\\|+\\s*NA.*$", "", df$Scripts)
though gsub
is not required
Or another option is stringi
library(stringi)
stri_replace(df$Scripts, regex="\\s*\\|+\\s*NA.*$", "")
#[1] "1:URT : 3456 || 2: ABC: 5677"
df <- structure(list(Scripts = "1:URT : 3456 || 2: ABC: 5677 || NA : 0 : 0 || NA : 0 : 0"), .Names = "Scripts", row.names = c(NA,
-1L), class = "data.frame")
Upvotes: 3