tiefenauer
tiefenauer

Reputation: 809

sed multiline delete everything before first occurrence of pattern

I have a multiline string containing some text followed by a JSON, so it has the following format:

Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

I want to extract the JSON using sed by removing the text before, so everything until (and including) MY_JSON: (note the trailing space).

My current solution:

# $str contains above multiline string
$ echo $str | sed '/MY_JSON: /d'

I get the following output:

Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

But I want the following output:

{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

So the idea is to select everything until the first occurrence of { and delete it. But that doesn't work. It doesn't delete the first n lines until the line where the pattern matches. It also deletes the whole line instead of just the part until the {.

How can I achive best with sed what I want to do?

Upvotes: 2

Views: 1759

Answers (5)

MatrixManAtYrService
MatrixManAtYrService

Reputation: 9231

If you're lucky enough to have the JSON start at the beginning of its own line, and start with a { on its own line, like so:

echo 'MY_JSON:
{
  "foo": [
    {
      "bar": "baz"
    }
  ]
}' > file.notquitejson

Then you can extract just the json like this:

{ echo '{'; cat file.notquitejson | sed '1,/^{/d' } > file.json

Explanation:

  • {} > file.json run some stuff and put its output in a file
  • { echo '{' } > file.json start that file with a {
  • cat file.notquitejson | sed '1,/^{/d' delete from lines 1 until and including the fist time that { starts a line
  • { echo '{'; cat file.notquitejson | sed '1,/^{/d' } > file.json start the file with a { and end the file with everything after the first {

Upvotes: 0

Dudi Boy
Dudi Boy

Reputation: 4900

Here is a solution on the positive approach.

Instead of removing data, extract data from the file.

$ sed --quiet '/MY_JSON:/,$  {s/^MY_JSON: //;p}' input.1.txt
{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

Explanation

--quiet Prevent print duplication.

/MY_JSON:/,$ Range of text from line matching regexp /MY_JSON:/ to last line. Denoted as $

{...} sed execution list on each line in the range.

s/^MY_JSON: //; p Substitute "MY_JSON: " with "" than print each line.

Upvotes: 0

Haru Suzuki
Haru Suzuki

Reputation: 142

If file has only one json structure

Input

It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}
sed '1h;1!H;${;g;s/^[^:]*:[^{]*\({.*}\).*/\1/p;}' -n
{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

If file has multiple json structures

Input

Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}
some
My: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}
sed '/^[^{]*{/,/^}/!d;s/^[^{]*{/{/g'

OR

sed '1h;1!H;${;g;s/^[^:]*:[^{]*\({.*}\).*/\1/;p}' -n | sed -n '/^[^{]*{/,/^}/{;p}' | sed 's/^[^{]*{/{/g'

In above command remove anything after ; to retain MY_JSON like titles

Output

{
  "foo": [
{
      "bar": "baz",
     (...) // more content here
    }
   ]
}
{
  "foo": [
{
      "bar": "baz",
     (...) // more content here
    }
   ]
}
{
  "foo": [
{
      "bar": "baz",
     (...) // more content here
    }
   ]
}

If alternative of sed are plausible: https://unix.stackexchange.com/questions/460087/extract-json-from-a-text-file-with-arbitrary-text has good solution with grep

Upvotes: 0

sseLtaH
sseLtaH

Reputation: 11247

Using sed

$ sed 's/^[a-zA-Z][^{]*//;/^$/d' input_file
{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

Upvotes: 0

anubhava
anubhava

Reputation: 786319

You may use this sed:

sed '1,/MY_JSON:/ {/MY_JSON:/!d; s/^MY_JSON: *//;}' file

{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

Command Details:

  • 1,/MY_JSON:/: Match from line 1 to the line that matches MY_JSON:
  • {/MY_JSON:/!d; s/^MY_JSON: *//;}: Delete all lines except last one and then remove MY_JSON: from that line.

Upvotes: 2

Related Questions