buondi
buondi

Reputation: 45

Replace a multiline pattern using Perl, sed, awk

I need to concatenate multiple JSON files, so

        ...
        "tag" : "description"
    }
]
[
    {
        "tag" : "description"
        ...

into this :

    ...
    "tag" : "description"
},
{
    "tag" : "description"
    ...

So I need to replace the pattern ] [ with ,, but the new line character makes me crazy...

I used several methods, I list some of them:

Upvotes: 4

Views: 1902

Answers (3)

mug896
mug896

Reputation: 2025

I tried with example in your question.

$ sed -rn '
    1{$!N;$!N}
    $!N
    /\s*}\s*\n\s*]\s*\n\s*\[\s*\n\s*\{\s*/M { 
        s//\},\n\{/
        $!N;$!N 
    }
    P;D
' file
        ...
        "tag" : "description"
},
{
        "tag" : "description"
        ...
        ...
        "tag" : "description"
},
{
        "tag" : "description"
        ...

Upvotes: 0

Borodin
Borodin

Reputation: 126722

Note that a better way to combine multiple JSON files is to parse them all, combine the parsed data structure, and reencode the result. Simply changing all occurrences of ][ to a comma , may alter data instead of markup

sed is a minimal program that will operate only on a single line of a file at a time. Perl encompasses everything that sed or awk will do and a huge amount more besides, so I suggest you stick with it

To change all ]...[ pairs in file.json (possibly separated by whitespace) to a single comma, use this

perl -0777 -pe "s/\]\s*\[/,/g" file.json > file2.json

The -0 option specifies an octal line separator, and giving it the value 777 makes perl read the entire file at once

One-liners are famously unintelligible, and I always prefer a proper program file, which would look like this

join_brackets.pl

use strict;
use warnings 'all';

my $data = do {
    local $/;
    <>;
}

$data =~ s/ \] \s* \[ /,/gx;

print $data;

and you would run it as

perl join_brackets.pl file.json > joined.json

Upvotes: 1

ikegami
ikegami

Reputation: 385565

I would like to concatenate several JSON files.

If I understand correctly, you have something like the following (where letters represent valid JSON values):

to_combine/file1.json: [a,b,c]
to_combine/file2.json: [d,e,f]

And from that, you want the following:

combined.json: [a,b,c,d,e,f]

You can use the following to achieve this:

perl -MJSON::XS -0777ne'
   push @data, @{ decode_json($_) };
   END { print encode_json(\@data); }
' to_combine/*.json >combined.json

As for the problem with your Perl solution:

  1. [ has a special meaning in regex patterns. You need to escape it.
  2. You only perform one replacement.
  3. -0 doesn't actually turn on slurp mode. Use -0777.
  4. You place the comma after the newline, when it would be nicer before the newline.

Fix:

cat to_combine/*.json | perl -0777pe's/\]\n\[/,\n/g' >combined.json

Upvotes: 2

Related Questions