Reputation: 32426
I'm trying to replace two parts of a string using replace-regexp-in-string
but I can only get one part to work at a time. Here is an example where I want to remove the #
and spaces from the beginning and the newline from the end of the string. What am I doing wrong when I combine the two calls into one expression?
;; Test string
(setq inputStr "## Header Stuff
")
;; This doesnt trim the newline
(setq header
(replace-regexp-in-string "^[#\s]*\\|\n$" "" inputStr) )
;; Each match done separately works though
(setq header
(replace-regexp-in-string "^[#\s]*" "" inputStr) )
(setq header
(replace-regexp-in-string "\n$" "" header) )
header
"Header Stuff"
UPDATE: the problem seems to be with the first expression, for example this replaces the newline and "S"
with "X"
, (replace-regexp-in-string "S\\|\n$" "X" inputStr)
.
Upvotes: 3
Views: 824
Reputation:
It looks like replace-regexp-in-string
has some unexpected behavior with regexps which match the empty string. The following regexp does what you would expect (note the +
quantifier in place of *
):
(let ((input-string "## Header Stuff
"))
(replace-regexp-in-string "\\`[#\s]+\\|\n*\\'" "" input-string))
The reason lies in the internal implementation of replace-regexp-in-string
, which you can look up using M-x find-function
. In pseudocode, it does approximately the following:
Given a regexp
, a replacement
, and a string
:
Set l
to the length of the string and start
to 0
. Create an empty stack called matches
to accumulate pieces of the new string.
As long as start
is less than l
and regexp
matches somewhere within string
, do the following:
Extract the portion of string
that matched the regexp, and call it str
.
Replace regexp
with replacement
, within the shorter string str
(this is important)
Push the following two fragments of the new string onto the matches
stack:
the unmatched initial portion of string
, from start
to the beginning of the match
the substring str
, in which the match for regexp
has now been replaced by replacement
Set start
to the end of the matched portion and repeat.
Finally, join up the string fragments on the matches
stack in reverse order and return the result.
The problem with your original regexp happens at step (3) of the loop. Even though the regexp correctly matches the newline at the end of the complete string "## Header stuff\n"
, when it is matched a second time against the one-character string "\n"
, the first branch of the alternative -- which matches an empty string -- takes priority over the second, and it replaces the empty string with the empty string, failing to remove the trailing new-line.
This is arguably a bug in replace-regexp-in-string
, but it also shows how tricky regexp semantics can be, especially when empty strings are involved. To me, the workaround solution is easier to read and understand:
(let ((input-string "## Header Stuff
"))
(setq input-string (replace-regexp-in-string "\\`[#\s]*" "" input-string))
(setq input-string (replace-regexp-in-string "\n*\\'" "" input-string))
input-string)
If you have a very recent Emacs (pretest 24.4 or higher), you can also use the string-trim-right
function from the builtin subr-x
package:
(let ((input-string "## Header Stuff
"))
(string-trim-right (replace-regexp-in-string "\\`[#\s]*" "" input-string)))
By the way, I was surprised to find out while investigating this that \s
in Emacs strings is just a different way of writing the space character. If you want regexp behavior similar to Perl's \s
wildcard, you might want to use "\\s-"
(match any character with whitespace syntax), or "[[:space:]]"
.
Upvotes: 2