Richard Hum
Richard Hum

Reputation: 651

How to turn newlines into indents with regular expressions

I have a list that looks something like this:

Item 1


Subitem 1

Item 2

Item 3


Subitem 1


Subitem 2



Subsubitem 1

Item 4

Pretty much, every top-level item has one newline before it, and each subitem has two newlines, and sub-subitems have three, and so on. I want it in a format similar to this:

Item 1
    Subitem 1
Item 2
Item 3
    Subitem 1
    Subitem 2
        Subsubitem 1
Item 4

The regex I have been using in vim is this:

For the first level:

%s/^$\n\(\t\w\)/\t\1/g

For the second level:

%s/^$\n\(\t\t\w\)/\t\1/g

and so on.

What's the better way to do this without having to run a different regex for each level of the list? I'm trying to use vim to do this, but any *nix solution is fine with me.

Upvotes: 0

Views: 84

Answers (4)

Peter Rincker
Peter Rincker

Reputation: 45087

This can be accomplished with :s and sub-replace-expression (\=).

:%s/^\n\+/\=repeat("\t",len(submatch(0))-1)/

Basically we count the number of \n's and replace them with the same number of \t's.

  • :%s/^\n\+/.../g find our sequence of \n's
  • %s/.../\={expr}/g replace the match with the evaluation of expression, {expr}.
  • submatch(0) get the n'th submatch. Same as \0 or & in this case.
  • repeat({str}, {num}) returns a string, {str} repeated {num} times.
  • len({str}) get length of string, {str}.
  • len(submatch(0))-1 decrement length as we want to keep the "good lines" on separate lines.

For more help see:

:h :s
:h sub-replace-expression
:h :repeat()
:h :len()
:h submatch()

Upvotes: 1

lcd047
lcd047

Reputation: 5851

The Perl way:

perl -0777pe 's/\n\K\n+/"\t"x(-1+length $&)/gse'

Using tr and GNU sed:

tr '\n' '\t' | sed -E 's/([^\t])\t\t/\1\n/g'

Output:

Item 1
        Subitem 1
Item 2
Item 3
        Subitem 1
        Subitem 2
                Subsubitem 1
Item 4

Upvotes: 1

Vaibhav Gupta
Vaibhav Gupta

Reputation: 401

One thing that you can do is to recursively use the following regex :

(?<!\n)\n\t*\n

Recursively find and replace all the occurrence of this regex

  • First pass Replace with : \n
  • Second pass Replace with : \n\t
  • Third pass Replace with : \n\t\t
  • Fourth pass Replace with : \n\t\t\t

...and so on until there is no match for the regex anywhere.

So you don't have to run a different regex every time, but still you'll have to change the replace with part. You can write a small program to recursively do it.

Upvotes: 0

mocman
mocman

Reputation: 11

That depends on what is executing the regular expression. E.g. Sed won't do the trick as it parses lines. If you are using sed, try to replace it with tr:

tr '\n' '\t'

Upvotes: 1

Related Questions