Reputation: 1264
Here is some data that I have:
animal {
dog {
body {
parts {
legs = old
brain = average
tail= curly
}
}
}
cat {
body {
parts {
legs = new
brain = average
tail {
base=hairy
tip=nothairy
}
}
}
}
}
Notice the data is not really json as it has the following rules:
=
or =
between key and value pairs."
or ,
throughout the data. separation of data is based on new line.Is it even possible to parse this with awk or sed? I tried jq
but it does not work as this isn't really true json data.
My goal is to display only "dog" and "cat". Based on them being the top values under "animal".
$ some-magical-command
dog
cat
Upvotes: 0
Views: 88
Reputation: 204731
To do what you currently want and for ease of any future manipulation of your data, you could use any POSIX awk
(for character classes) to convert your structure to JSON and then use jq
on it:
$ cat tst.awk
BEGIN { print "{" }
!NF { next }
{
sub(/[[:space:]]+$/,"")
gsub(/[[:alnum:]_]+/,"\"&\"")
gsub(/ *= */,": ")
sub(/" *{/,"\": {")
}
(++nr) > 1 {
sep = ( /"/ && (prev ~ /["}]$/) ? "," : "" )
printf "%s%s%s", prev, sep, ORS
}
{ prev = $0 }
END { print prev ORS "}" }
$ awk -f tst.awk file
{
"animal": {
"dog": {
"body": {
"parts": {
"legs": "old",
"brain": "average",
"tail": "curly"
}
}
},
"cat": {
"body": {
"parts": {
"legs": "new",
"brain": "average",
"tail": {
"base": "hairy",
"tip": "nothairy"
}
}
}
}
}
}
Current and some possible future uses:
$ awk -f tst.awk file | jq -r '.animal | keys[]'
cat
dog
$ awk -f tst.awk file | jq -r '.animal.dog.body.parts | keys[]'
brain
legs
tail
$ awk -f tst.awk file | jq -r '.animal.dog.body.parts'
{
"legs": "old",
"brain": "average",
"tail": "curly"
}
$ awk -f tst.awk file | jq -r '.animal.cat.body.parts'
{
"legs": "new",
"brain": "average",
"tail": {
"base": "hairy",
"tip": "nothairy"
}
}
The above assumes your input always looks as shown in your question.
Upvotes: 1
Reputation: 241971
If you only need the second-level keys, and you're not too concerned about producing good error messages for erroneous inputs, then it's pretty straight-forward. The basic idea is this:
There are three formats for an input line:
As the lines are read, we keep track of nesting depth by incrementing a counter with the first line type and decrementing it with the third line type.
When the nesting counter is 1, if the line has an ID
field, we print it.
That can be done quite simply with an awk script. This script should be saved in a file with a name like level2_keys.awk
; you can then execute the command awk -f level2_keys.awk /path/to/input/file
. Note that all the rules end with next;
to avoid rules following a match being evaluated.
$1 == "}" { # Decrement nesting on close
--nesting;
next;
}
/=/ { # Remove the if block if you don't want to print these keys.
if (nesting == 1) {
gsub("=", " = "); # Force = to be a field
print($1);
}
next;
}
$2 == "{" { # Increment nesting (and maybe print) on open
if (nesting == 1) print($1);
++nesting;
next;
}
# NF is non-zero if the line is not blank.
NF { print "Bad input at " NR ": '"$0"'" > "/dev/stderr"; }
Upvotes: 1
Reputation: 247240
It's fairly close to tcl syntax, if you feel like learning a new language.
set data {
animal {
dog {
body {
parts {
legs = old
brain = large
tail= curly
}
}
}
cat {
body {
parts {
legs = new
brain = tiny
tail {
base=hairy
tip=nothairy
}
}
}
}
}
}
set data [regsub -line -all {\s*=\s*(.+)} $data { "\1"}]
dict get $data animal dog body parts brain ;# => large
I know some people who would argue about your classification of dog brains vs cat brains...
Upvotes: 1