Reputation: 809
I have a multiline string containing some text followed by a JSON, so it has the following format:
Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
I want to extract the JSON using sed by removing the text before, so everything until (and including) MY_JSON:
(note the trailing space).
My current solution:
# $str contains above multiline string
$ echo $str | sed '/MY_JSON: /d'
I get the following output:
Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
But I want the following output:
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
So the idea is to select everything until the first occurrence of {
and delete it. But that doesn't work. It doesn't delete the first n lines until the line where the pattern matches. It also deletes the whole line instead of just the part until the {
.
How can I achive best with sed what I want to do?
Upvotes: 2
Views: 1759
Reputation: 9231
If you're lucky enough to have the JSON start at the beginning of its own line, and start with a {
on its own line, like so:
echo 'MY_JSON:
{
"foo": [
{
"bar": "baz"
}
]
}' > file.notquitejson
Then you can extract just the json like this:
{ echo '{'; cat file.notquitejson | sed '1,/^{/d' } > file.json
Explanation:
{} > file.json
run some stuff and put its output in a file{ echo '{' } > file.json
start that file with a {
cat file.notquitejson | sed '1,/^{/d'
delete from lines 1 until and including the fist time that {
starts a line{ echo '{'; cat file.notquitejson | sed '1,/^{/d' } > file.json
start the file with a {
and end the file with everything after the first {
Upvotes: 0
Reputation: 4900
Here is a solution on the positive approach.
Instead of removing data, extract data from the file.
$ sed --quiet '/MY_JSON:/,$ {s/^MY_JSON: //;p}' input.1.txt
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
--quiet
Prevent print duplication.
/MY_JSON:/,$
Range of text from line matching regexp /MY_JSON:/ to last line. Denoted as $
{...}
sed execution list on each line in the range.
s/^MY_JSON: //; p
Substitute "MY_JSON: " with "" than print each line.
Upvotes: 0
Reputation: 142
If file has only one json structure
Input
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
sed '1h;1!H;${;g;s/^[^:]*:[^{]*\({.*}\).*/\1/p;}' -n
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
If file has multiple json structures
Input
Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
some
My: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
sed '/^[^{]*{/,/^}/!d;s/^[^{]*{/{/g'
OR
sed '1h;1!H;${;g;s/^[^:]*:[^{]*\({.*}\).*/\1/;p}' -n | sed -n '/^[^{]*{/,/^}/{;p}' | sed 's/^[^{]*{/{/g'
In above command remove anything after ; to retain MY_JSON like titles
Output
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
If alternative of sed are plausible: https://unix.stackexchange.com/questions/460087/extract-json-from-a-text-file-with-arbitrary-text has good solution with grep
Upvotes: 0
Reputation: 11247
Using sed
$ sed 's/^[a-zA-Z][^{]*//;/^$/d' input_file
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
Upvotes: 0
Reputation: 786319
You may use this sed
:
sed '1,/MY_JSON:/ {/MY_JSON:/!d; s/^MY_JSON: *//;}' file
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
Command Details:
1,/MY_JSON:/
: Match from line 1 to the line that matches MY_JSON:
{/MY_JSON:/!d; s/^MY_JSON: *//;}
: Delete all lines except last one and then remove MY_JSON:
from that line.Upvotes: 2