Ariel Frischer
Ariel Frischer

Reputation: 1592

Equivalent Regex Unsupported lookbehind assertion IOS Safari

This regex:

var text = "Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this isn't true... Well, with a probability of .9 it isn't."
// break string up in to sentences based on punctation and quotation marks
var tokens = text.match(/(?<=\s+|^)[\"\'\‘\“\'\"\[\(\{\⟨](.*?[.?!])(\s[.?!])*[\"\'\’\”\'\"\]\)\}\⟩](?=\s+|$)|(?<=\s+|^)\S(.*?[.?!])(\s[.?!])*(?=\s+|$)/g);

breaks on IOS Safari due to unsupported lookbehind assertions ((?<= ) and (?<! )). Is there an equivalent (or similar) regex for sentence tokenization that I can use? Preferably it should not break due to other iOS safari compatibility issues as referenced here: (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#assertions) ECMAScript (ECMA-262) The definition of 'RegExp' in that specification.

Upvotes: 1

Views: 2791

Answers (2)

Giorgi Gvimradze
Giorgi Gvimradze

Reputation: 2129

the issue was in ?<=. if you somehow replace them, in my case ?!, it might be fine.

Upvotes: -1

anubhava
anubhava

Reputation: 785856

Here is a version of your regex that you can use without using any lookbehind assertions to break input into sentences:

/(?:\s|^)(?:["'‘“'"\[({⟨].*?[.?!](?:\s[.?!])*["'’”'"\])}⟩]|\S.*?[.?!](?:\s[.?!])*)(?=\s|$)/gm

RegEx Demo

Please keep in mind that your regex may break on sentences where there are words ending with dots such as Jr., Sr. Mr. etc and few more cases like that.

Upvotes: 0

Related Questions