Will Martin
Will Martin

Reputation: 4180

Notepad++ regular expressions: Matching multiple segments separated by token

For many years, I used a very handy trick in Notepad++ and SciTE which allowed me to split a given string up by a token. For example, given this input:

first name|last name
bob|johansen
scarlet|scarnetti
nelson|huguemeyer

I could then execute a regular expression to turn it into an HTML table. The search string would be:

(.+)|(.+)

And the replace string would be:

<tr><td>\1</td><td>\2</td></tr>

The end result would be:

<tr><td>first name</td><td>last name</td></tr>
<tr><td>bob</td><td>johansen</td></tr>
<tr><td>scarlet</td><td>scarnetti</td></tr>
<tr><td>nelson</td><td>huguemeyer</td></tr>

When I have spreadsheets that are hundreds of lines long and need to be converted into HTML format, this was extremely useful!

Unfortunately, in recent versions it appears that the regular expression engine has changed such that my search pattern above no longer works. The first occurrence of (.+) matches everything from the beginning of the line to the end of the line, ignoring the intervening | characters.

I've flailed helplessly through a variety of different search patterns trying to find one that will get everything up to the first |, then everything after it. In longer examples, there might be five or six different segments separated by | characters.

So far, my efforts have failed. What do I need to do to split a line of input at specific tokens via regular expressions in Notepad++?

Upvotes: 1

Views: 2078

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626950

In the current NPP versions, | is an alternation operator. It must be escaped outside a character class to match a literal pipe symbol.

Your ^([^\|]+)\|([^\|]+)\|([^\|]+)$ will only match 3 part | delimited strings.

I want to suggest a regex that implements the logic in my second comment to the question:

(^)|($)|\|

and replace with

(?{1}<tr><td>:?{2}</td></tr>:</td><td>)

Search pattern details:

  • (^) - Group 1: start of line
  • | - or
  • ($) - Group 2: end of line
  • | - or
  • \| - a literal |.

Replacement details:

  • (?{1} - If Group 1 matched,
    • <tr><td> - replace (actually, add) <tr><td> at the line start
  • :?{2} - else, if Group 2 matches,
    • </td></tr> - add </td></tr> at the line end
  • : - else, | is replaced with </td><td>
  • ) - end of the conditional replacement clause.

See the screenshot:

enter image description here

Upvotes: 2

Related Questions