Neil
Neil

Reputation: 8247

How to replace a pattern

I have following dataframe:

                                         Scripts
1:URT : 3456 || 2: ABC: 5677 || NA : 0 : 0 || NA : 0 : 0

The pattern I want replace is '|| NA' and whatever after that with whitespace. I am using following command in r to do it:

gsub("\\|| NA.*$","",df$Scripts) 

But it replaces || in the middle between two scripts which I do not want. Desired output is:

1:URT : 3456 || 2: ABC: 5677

Upvotes: 1

Views: 54

Answers (2)

Sotos
Sotos

Reputation: 51622

A non-regex approach,

sapply(strsplit(df$Scripts, '||', fixed = TRUE), function(i) 
                     paste(i[!grepl('NA', i)], collapse = '||'))

#[1] "1:URT : 3456 || 2: ABC: 5677 "

You can wrap it in trimws to get rid of leading/following white spaces

Upvotes: 1

akrun
akrun

Reputation: 887981

We can use sub to match zero or more spaces (\\s*) followed by one or more | (as it is a metacharacter, we can either escape (\\) or place it in square brakcets followed by one or more spaces, then NA and the rest of the characters, and replace it with ""

sub("\\s*[|]+\\s+NA\\s+.*", "", df$Scripts)
#[1] "1:URT : 3456 || 2: ABC: 5677"

NOTE: In the OP's code, it is just escaping the first | and not the second one. Instead it should be

gsub("\\s*\\|+\\s*NA.*$", "", df$Scripts)

though gsub is not required


Or another option is stringi

library(stringi)
stri_replace(df$Scripts, regex="\\s*\\|+\\s*NA.*$", "")
#[1] "1:URT : 3456 || 2: ABC: 5677"

data

df <- structure(list(Scripts = "1:URT : 3456 || 2: ABC: 5677 || NA : 0 : 0 || NA : 0 : 0"), .Names = "Scripts", row.names = c(NA, 
-1L), class = "data.frame")

Upvotes: 3

Related Questions