Alexander Morland
Alexander Morland

Reputation: 6434

Cli command to remove specific newlines

Given a markdown file that contains a bunch of these blocks:

```
json :
{
  "something": "here"
}
```

And I want to fix all of these to become valid markdown, i.e.:

```json
{
  "something": "here"
}
```

How can I do that effectively across any number of files?

I have Googled around a bit and found similar issues, but been unable to convert their solutions to my specific need. It seems that SED is not great at multiple line matching and the inclusion of the ` character is obviously also causing issues.

I've tried with

perl -pe "s/\njson :/json/g"

but that did not give any matches.

Upvotes: 1

Views: 54

Answers (1)

simbabque
simbabque

Reputation: 54333

To make your Perl program work, you need to change the input record separator $/. A simple BEGIN block will do to undef it before the program runs its while loop.

foo is your input file.

$ perl -pe 'BEGIN{undef $/} s/\njson :/json/g' foo
```json
{
  "something": "here"
}

Perl will now slurp in the whole file at once, which should be fine for a markdown document. If you want to process files of several GBs of size, get more RAM though.

Note that you need -i as well to do in-place editing.

$ perl -pi -e '...' *

A much shorter version is to use the -0 flag instead of the BEGIN block to tell Perl about the input record separator. perlrun says this:

The special value 00 will cause Perl to slurp files in paragraph mode. Any value 0400 or above will cause Perl to slurp files whole, but by convention the value 0777 is the one normally used for this purpose.


You could have detected this yourself by running your program with the re 'debug' pragma, which turns on debugging mode for regex. It would have told you.

$ perl -Mre=debug -pe 's/\njson :/json/g' foo
Compiling REx "\njson :"
Final program:
   1: EXACT <\njson :> (4)
   4: END (0)
anchored "%njson :" at 0 (checking anchored isall) minlen 7 
Matching REx "\njson :" against "```%n"
Regex match can't succeed, so not even tried
```
Matching REx "\njson :" against "json :%n"
Intuit: trying to determine minimum start position...
  Did not find anchored substr "%njson :"...
Match rejected by optimizer
json :
Matching REx "\njson :" against "{%n"
Regex match can't succeed, so not even tried
{
Matching REx "\njson :" against "  %"something%": %"here%"%n"
Intuit: trying to determine minimum start position...
  Did not find anchored substr "%njson :"...
Match rejected by optimizer
  "something": "here"
Matching REx "\njson :" against "}%n"
Regex match can't succeed, so not even tried
}
Matching REx "\njson :" against "```%n"
Regex match can't succeed, so not even tried
```
Matching REx "\njson :" against "%n"
Regex match can't succeed, so not even tried

Freeing REx: "\njson :"

The giveaway is this:

Matching REx "\njson :" against "```%n"
Regex match can't succeed, so not even tried

Upvotes: 4

Related Questions