Carl Younger
Carl Younger

Reputation: 3080

Negative Lookbehind: Match a substring that's not preceded one of a set of characters

Question

How do you define a regular expression that will match each substring that:

Case

I have a function that removes hardcoded newlines from strings of text, so they will reflow properly. The function works fine, apart from intelligently handling hyphenation.

This is a simplified version of what I have for hyphens.

function (string) { return string.replace(/-\n/g, "") }

It works on things it should work on, no problem. So this...

A hyphen-
ated line.

...becomes...

A hyphenated line.

But it goes too far, and doesn't handle dashes properly, so these examples get garbled:

"""
Mary Rose sat on a pin -
Mary rose.

Mary Rose sat on a pin --
Mary rose.
"""

The function should only consider the -\n pattern a match if it's not preceded by a hyphen or any kind of whitespace character.

Upvotes: 0

Views: 70

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89639

You can change your pattern to this:

function (string) { return string.replace(/\b-\n/g, "") }

With a word boundary \b that is the limit between a word character and an other character.

Upvotes: 2

anubhava
anubhava

Reputation: 786291

You can use:

var repl = string.replace(/([^\s-])-\n/g, "$1");

RegEx Demo

Upvotes: 2

Related Questions