dx_over_dt
dx_over_dt

Reputation: 14318

Regex to match/replace leading tabs without lookbehind

I am trying to match each \t in the leading whitespace of a line so I can replace them with two spaces. This is trivial with an unbounded (i.e., variable-length) lookbehind.

text.replace(/(?<=^\s*)\t/gm, '  ')

Unfortunately, this code is running on iOS, and for some reason, Safari and iOS have yet to implement lookbehinds, let alone unbounded lookbehinds.

I know there are workarounds for lookbehinds, but I can't seem to get the ones I've looked at to work.

I would rather not capture any characters aside from each tab, but if there's no other way, I could capture characters around the tabs in capture groups and add $1, etc, to my replacement string.

Example test code

const text = `
\t\ta
  \t  b
 \t  \t c\td  \te
`

const expected = `
    a
      b
        c\td  \te
`

// throws error in iOS, which does not support lookbehinds
// const regex = /(?<=^\s*)\t/gm;
const regex = /to-do/gm;

const result = text.replace(regex, '  ')

console.log(`Text: ${text}`)
console.log(`Expected: ${expected}`)
console.log(`Result: ${result}`)
console.log(JSON.stringify([ expected, result ], null, 2))

if (result === expected) {
  console.info('Success! 😃')
} else {
  console.error('Failed 😞')
}

Update

A less than ideal workaround would be to use two regexes and a replacer function.

const text = `
\t\ta
  \t  b
 \t  \t c\td  \te
`

const expected = `
    a
      b
        c\td  \te
`

const result = text.replace(/^\s*/gm, m => m.replace(/\t/g, '  '))

if (result === expected) {
  console.info('Success! 😃')
} else {
  console.error('Failed 😞')
}

Again, less than ideal. I'm a purist.

Upvotes: 4

Views: 187

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110725

Here's a Ruby solution, should a reader wish to post a Javascript solution based on it.

rgx = /[a-z].*|\t/
str.gsub(rgx) { |s| s[0] == "\t" ? '  ' : s }

where str holds the string that is to be modified.

The regular expression is a two-part alternation:

[a-z]  # match a lower-case letter
.*     # match zero or more characters (to end of string)
|      # or
\t     # match a tab

Each match is passed to the "block" ({ |s| ...}) and is held by the block variable s. If the first character of the match is a tab two spaces are returned; else s is returned. If [a-z].* is matched there will be no further matches because the remainder of the string (possibly including tabs) will have been consumed.

In Python a lambda would by used in place of Ruby's block, something like

lambda m: '  ' if m.group()[0] == "\t" else m.group()

Upvotes: 0

anubhava
anubhava

Reputation: 785671

You may use this Javascript solution without involving looknbehind:

const text = `
\t\ta
  \t  b
 \t  \t c\td  \te
`;

var repl = text.replace(/^[ \t]+/mg, g => g.replace(/\t/g, '  '));

console.log(repl);

Upvotes: 1

Related Questions