ACesario
ACesario

Reputation: 89

Regex, how to remove Linefeeds in a pipe delimted file EXCEPT after the nth |?

I have a pipe delimited file with 35 pipes per line.There is an expected line feed after the 35th field. For example:

FirstField|ME|HERE|PHONE|Description|.....|LastField
FirstField|YOu|THERE|PHONE|Description|.....|LastField

However, some of the data between pipes (for example in a description field) contains line feeds. Eg:

FirstField|Them|Where|PHONE|This contains a
LineFeed
Or two
or more|.....|LastField

Question is, how to remove the Line Feeds in any of the 35 fields, but not at the end of the line?

(Off note: I'm working in Notepad++ for testing)

Upvotes: 3

Views: 99

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

You may leverage the Notepad++ PythonScript plug-in.

See instructions on how to install a working version here and create the following script file:

def repl(match):
    return match.group(0).replace("\r\n", "").replace("\n", "").replace("\r", "")

editor.rereplace(r'^[^|]*(?:\|[^|]*){36}$', repl)

If you name the script file as replace_lbr_inblock.py, you will be able to call it by selecting Plugins -> Python Script -> Scripts -> replace_lbr_inblock.

The regex ^[^|]*(?:\|[^|]*){36}$ matches

  • ^ - start of the line
  • [^|]* - zero or more chars other than |
  • (?:\|[^|]*){36} - 36 sequences of a | followed with zero or more pipes
  • $ - end of line.

Before:

enter image description here

After:

enter image description here

Upvotes: 5

Related Questions