Golang replace any and all newline characters

Question

Usually, when I'm replacing newlines I jump to Regexp, like in this PHP

preg_replace('/\R/u', "
", $String);

Because I know that to be a very durable way to replace any kind of Unicode newline (be it , , , etc.)

I was trying to something like this in Go as well, but I get

error parsing regexp: invalid escape sequence: \R

On this line

msg = regexp.MustCompilePOSIX("\R").ReplaceAllString(html.EscapeString(msg), "

")

I tried using (?:(?> )|\v) from https://stackoverflow.com/a/4389171/728236, but it looks like Go's regex implementation doesn't support that either, panicking with invalid or unsupported Perl syntax: '(?>'

What's a good, safe way to replace newlines in Go, Regex or not?

I see this answer here Golang: Issues replacing newlines in a string from a text file saying to use ?, but I'm hesitant to believe that it would get all Unicode newlines, mainly because of this question that has answer listing many more newline codepoints than the 3 that ? covers,

Wiktor Stribiżew · Accepted Answer

You may "decode" the \R pattern as

U+000DU+000A|[U+000AU+000BU+000CU+000DU+0085U+2028U+2029]

See the Java regex docs explaining the \R shorthand:

Linebreak matcher
\R  Any Unicode linebreak sequence, is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]

In Go, you may use the following:

func removeLBR(text string) string {
    re := regexp.MustCompile(`\x{000D}\x{000A}|[\x{000A}\x{000B}\x{000C}\x{000D}\x{0085}\x{2028}\x{2029}]`)
    return re.ReplaceAllString(text, ``)
}

Here is a Go demo.

Some of the Unicode codes can be replaced with regex escape sequences supported by Go regexp:

re := regexp.MustCompile(`
|[
\v\f\x{0085}\x{2028}\x{2029}]`)

Golang replace any and all newline characters

Answers (2)

Related Questions