Reputation: 21
suppose I have a string that has the following characters
"\"------------080209060700030309080805\""
And now I want to use gsub function in r to remove the "\ and \" part, and only keep the following characters:
"------------080209060700030309080805\"
Could anyone help me to figure out how should I do it properly ?
Upvotes: 0
Views: 3886
Reputation: 8760
Edit 1: Fixed bug (two backslashes required to create a backslash in a string):
s <- '\\"------------080209060700030309080805\\"'
s
gsub('\\"', "", s, fixed = TRUE)
results in
> s <- '\\"------------080209060700030309080805\\"'
> s
[1] "\\\"------------080209060700030309080805\\\""
> gsub('\\"', "", s, fixed = TRUE)
[1] "------------080209060700030309080805"
Please note that a single backslash in R is the escape code which is NOT part of the string:
> charToRaw('\\"')
[1] 5c 22
> charToRaw('\"')
[1] 22
Therefor you have to use two backslashes in the quoted string to create one backslash internally. If you print this string the backslash is escaped again which looks confusing:
> print('\\"')
[1] "\\\""
If you want to print the unescaped content of the string use cat
instead of print
:
> cat('\\"')
\"
For more see help in R: ?"'"
:
Character constants
Single and double quotes delimit character constants. They can be used interchangeably but double quotes are preferred (and character constants are printed using double quotes), so single quotes are normally only used to delimit character constants containing double quotes.
Backslash is used to start an escape sequence inside character constants. Escaping a character not in the following table is an error.
Single quotes need to be escaped by backslash in single-quoted strings, and double quotes in double-quoted strings.
\n newline \r carriage return \t tab \b backspace \a alert (bell) \f form feed \v vertical tab \ backslash \ \' ASCII apostrophe ' \" ASCII quotation mark " ` ASCII grave accent (backtick) ` \nnn
character with given octal code (1, 2 or 3 digits) \xnn character with given hex code (1 or 2 hex digits) \unnnn Unicode character with given code (1--4 hex digits) \Unnnnnnnn Unicode character with given code (1--8 hex digits)
Upvotes: 5
Reputation: 521279
string <- "\\------------080209060700030309080805\\"
string <- gsub("^\\\\(.*)\\\\$", "\\1", string)
Notes: The pattern I used was ^\(.*)\$
, which will match everything in between a beginning and ending backslash. This would only match strings therefore which both begin and end with backslash. Also, we use four backslashes (\\\\
) to represent a literal backslash for the pattern in gsub()
. We need to escape twice, once for R, and a second time for the regex engine.
Upvotes: 1