Reputation: 461
I am working with JSON output from a tool (massdns) that is formatted as follows:
{"query_name":"1eaff.example.com.","query_type":"A","resp_name":"ns02.example.com.","resp_type":"A","data":"<ip>"}
{"query_name":"1cf0e.example.com.","query_type":"A","resp_name":"ns01.example.com.","resp_type":"A","data":"<ip>"}
{"query_name":"1cf0e.example.com.","query_type":"A","resp_name":"ns02.example.com.","resp_type":"A","data":"<ip>"}
{"query_name":"1fwsjz2f4ok1ot2hh2illyd1-wpengine.example.com.","query_type":"A","resp_name":"ns01.example.com.","resp_type":"A","data":"<ip>"}
{"query_name":"1fwsjz2f4ok1ot2hh2illyd1-wpengine.example.com.","query_type":"A","resp_name":"ns02.example.com.","resp_type":"A","data":"<ip>"}
{"query_name":"1a811.example.com.","query_type":"A","resp_name":"ns01.example.com.","resp_type":"A","data":"<ip>"}
I am able to use jq
with slurp (-s
) to beautifully output the results in the format I need:
jq -s '{ a: "xxx", "b": 123, domains: map(select(.resp_type=="A") | .resp_name[:-1] ) | unique }'
This yields a JSON string like:
{
"a": "xxx",
"b": 123,
"domains": [
"ns01.example.com",
"ns02.example.com"
]
}
(See JQPlay example here.)
My problem occurs when my input scales to hundreds of thousands of lines (GBs of data), in which case slurp becomes too memory-consuming, and jq
exits with an error.
I have discovered the --stream
option, which allows handling large inputs, but am struggling to find a way to obtain the same output. Is there a way to use --stream
(and not --slurp
) to get the wanted output for a very large input file with jq
?
Upvotes: 2
Views: 431
Reputation: 50750
--stream
would overcomplicate this task, use --null-input/-n
option in conjunction with reduce
instead.
{a: "xxx", b: 123}
| .domains = (reduce (inputs|select(.query_type == "A").resp_name) as $d
({}; . + {($d): null}) | keys_unsorted | map(.[:-1]))
Keeping domains in an object as keys instead of an array makes this script even more efficient in terms of memory consumption and cpu time; in jq, Objects are added by merging, that is, inserting all the key-value pairs from both objects into a single combined object. If both objects contain a value for the same key, the object on the right of the +
wins. Thus no need to unique
.
Trimming the last char off (.[:-1]
) all resp_name
s slows down the process as well, performing map(.[:-1])
on resulting array instead is more efficient.
See it on jqplay.
Upvotes: 3