Reputation: 72001
I'd like to be able to pipe the output from any command line program to a command that converts it to json.
For example my unknown program could accept target columns, a delimiter and output field names
# select columns 1 and 3 from the output and convert it to simple json
netstat -a | grep CLOSE_WAIT | convert_to_json 1,3 name,other
and would generate something like so:
[
{"name": "tcp4", "other": "31"},
{"name": "tcp4", "other": "0"}
...
]
netstat
!EDIT: I'm of course willing to pass in any more info that'd be required to make it work, for example a delimiter or multiple delimiters - I'd just like to avoid explicit parsing in the command line, and have the tool do that.
Upvotes: 4
Views: 8937
Reputation: 72001
I found a great list of tools to work with command line output, and one of the tools listed is sqawk which will convert any arbitrary data to json, and let you filter it using sql like queries!
ps
output to JSONps | sqawk -output json,indent=1 'select PID,TTY,TIME,CMD from a' trim=left header=1
[{
"PID" : "3947",
"TTY" : "pts/2",
"TIME" : "00:00:07",
"CMD" : "zsh"
},{
"PID" : "15951",
"TTY" : "pts/2",
"TIME" : "00:00:00",
"CMD"
}]
Upvotes: 4
Reputation: 54233
Here's my ruby version :
#! /usr/bin/env ruby
#
# Converts stdin columns to a JSON array of hashes
#
# Installation : Save as convert_to_json, make it executable and put it somewhere in PATH. Ruby must be installed
#
# Examples :
#
# netstat -a | grep CLOSE_WAIT | convert_to_json 1,3 name,other
# ls -l | convert_to_json
# ls -l | convert_to_json 6,7,8,9
# ls -l | convert_to_json 6,7,8,9 month,day,time,name
# convert_to_json 1,2 time,value ";" < some_file.csv
#
#
# http://stackoverflow.com/questions/40246134/convert-arbitrary-output-to-json-by-column-in-the-terminal
require 'json'
script_name = File.basename(__FILE__)
syntax = "Syntax : command_which_outputs_columns | #{script_name} column1_id,column2_id,...,columnN_id column1_name,column2_name,...,columnN_name delimiter"
if $stdin.tty? or $stdin.closed? then
$stderr.puts syntax
else
if ARGV[2]
delimiter = ARGV[2]
$stderr.puts "#{script_name} : Using #{delimiter} as delimiter"
else
delimiter = /\s+/
end
column_ids = (ARGV[0] || "").split(',').map{|column_id| column_id.to_i-1}
column_names = (ARGV[1] || "").split(',')
results = []
$stdin.each do |stdin_line|
if column_ids.empty?
values = stdin_line.strip.split(delimiter)
else
values = stdin_line.strip.split(delimiter).values_at(*column_ids)
end
line_hash=Hash.new
values.each_with_index.each{|value,i|
colum_name = column_names[i] || "column#{(column_ids[i] || i)+1}"
line_hash[colum_name]=value
}
results<<line_hash
end
puts JSON.pretty_generate(results)
end
It works as defined in your example :
netstat -a | grep CLOSE_WAIT | convert_to_json 1,3 name,other
[
{
"name": "tcp",
"other": "0"
},
{
"name": "tcp6",
"other": "0"
}
]
As a bonus, you can
Other examples :
netstat -a | grep CLOSE_WAIT | ./convert_to_json
# [
# {
# "column1": "tcp",
# "column2": "1",
# "column3": "0",
# "column4": "10.0.2.15:51074",
# "column5": "123.45.101.207:https",
# "column6": "CLOSE_WAIT"
# },
# {
# "column1": "tcp6",
# "column2": "1",
# "column3": "0",
# "column4": "ip6-localhost:50293",
# "column5": "ip6-localhost:ipp",
# "column6": "CLOSE_WAIT"
# }
# ]
netstat -a | grep CLOSE_WAIT | ./convert_to_json 1,3
# [
# {
# "column1": "tcp",
# "column3": "0"
# },
# {
# "column1": "tcp6",
# "column3": "0"
# }
# ]
ls -l | tail -n3 | convert_to_json 6,7,8,9 month,day,time,name
# [
# {
# "month": "Oct",
# "day": "27",
# "time": "10:35",
# "name": "test.dot"
# },
# {
# "month": "Nov",
# "day": "2",
# "time": "14:27",
# "name": "uniq.rb"
# },
# {
# "month": "Nov",
# "day": "2",
# "time": "14:27",
# "name": "utf8_nokogiri.rb"
# }
# ]
# NOTE: ls -l uses the 8th column for year, not time, for older files :
ls --full-time -t /usr/share/doc | tail -n3 | ./convert_to_json 6,7,9 yyyymmdd,time,name
[
{
"yyyymmdd": "2013-10-21",
"time": "15:15:20.000000000",
"name": "libbz2-dev"
},
{
"yyyymmdd": "2013-10-10",
"time": "16:27:32.000000000",
"name": "zsh"
},
{
"yyyymmdd": "2013-10-03",
"time": "18:52:45.000000000",
"name": "manpages-dev"
}
]
ls -l | tail -n3 | convert_to_json 9,12
# [
# {
# "column9": "test.dot",
# "column12": null
# },
# {
# "column9": "uniq.rb",
# "column12": null
# },
# {
# "column9": "utf8_nokogiri.rb",
# "column12": null
# }
# ]
convert_to_json 1,2 time,value ";" < some_file.csv
# convert_to_json : Using ; as delimiter
# [
# {
# "time": "1",
# "value": "3"
# },
# {
# "time": "2",
# "value": "5"
# }
# ]
Upvotes: 5
Reputation: 2546
Here it is my python version:
#!/usr/bin/env python3
import json
import re
def all_columns_to_json (column_dict, columns_line):
json_object = {}
for column_index, column_value in enumerate(columns_line):
if column_index in column_dict:
column_name = column_dict[column_index]
else:
column_name = str(column_index)
json_object[column_name] = column_value
return json_object
def filter_columns_in_dict_to_json(column_dict, columns_line):
'''Parse columns_line, make sure every element in column_dict
exists there, filter elements that are not in column_dict from
columns_line, and convert it to a dict.
'''
json_object = {}
for column_index, column_name in column_dict.items():
try:
json_object[column_name] = columns_line[column_index]
except IndexError as err:
# columns_line doesn't has column_index.
raise ValueError('Invalid table line ({}) : no {} element.'.format(columns_line,
column_index)) from err
return json_object
def columns_line_to_json (column_dict, columns_line, should_filter_colunms):
'''Parse a list of values to a json object with special names.
'''
if should_filter_colunms:
return filter_columns_in_dict_to_json(column_dict, columns_line)
else:
return all_columns_to_json(column_dict, columns_line)
def regex_from_delims_list(delims_list):
'''Get a regex compiled pattern from a delims list'''
one_characters_delims = ''
final_pattern = ''
for delim in delims_list:
delim_and_maybe_min_max = delim.split(':')
escaped_delim = re.escape(delim_and_maybe_min_max[0])
# Check if this is a delim without min count.
if len(delim) == 1:
final_pattern += "%s{1,}|" % (escaped_delim)
elif len(delim) == 2:
min_and_maybe_max = delim_and_maybe_min_max[1].split('-')
current_pattern = escaped_delim
# Add count to the regex (only min or max too)
if len(min_and_maybe_max) == 2:
current_pattern += '{%d,%d}' % (int(min_and_maybe_max[0],
int(min_and_maybe_max[1])))
else:
current_pattern += '{%d,}' % (int(min_and_maybe_max[0]))
final_pattern += current_pattern + '|'
else:
raise ValueError("Invalid ':' count in the delimiter argument")
# If there are one character delims without count, add them. If not
# Remove the last OR ('|').
final_pattern = final_pattern[:-1]
return re.compile (final_pattern)
def main(args):
column_dict = {}
# Split the user's argument by a comma, and parse each columns
# seperatly.
for column_and_name in args.columns_and_names.split(','):
# Split the name from the columns.
column_and_name = column_and_name.split('=')
if len(column_and_name) > 2:
raise ValueError("Invalid column: {}".format(str(column_and_name())))
# If there is not name, set it to the column index.
if len(column_and_name) == 1:
column_and_name.append (str(column_and_name[0]))
# Try to convert the column index is it isn't '*'
if column_and_name[0] != '*':
try:
column_and_name[0] = int(column_and_name[0])
except ValueError as err:
raise ValueError('Invalid column index: {} (not an integer)'.format(column_and_name[0])) from err
# Add this column definition.
column_dict[column_and_name[0]] = column_and_name[1]
# Check if column_dict has the '*' member.
# If it does, we will print all of the columns (even ones that
# are not in column_dict)
should_filter_colunms = ('*' not in column_dict)
# We have checked it, no need for it now.
if not should_filter_colunms:
del column_dict['*']
# Parse the delim list into a regex pattern.
strip_regex_pattern = regex_from_delims_list(args.delimiters)
json_objects_list = []
for fd in args.infiles:
for line in fd:
# Convert bytes object to string.
if isinstance(line, bytes):
line = line.decode('utf-8')
# Strip the \n in the end of the line.
line = line.rstrip('\n')
# Split the line by the delims.
splitted_line = re.split(strip_regex_pattern, line)
json_objects_list.append (columns_line_to_json (column_dict, splitted_line, should_filter_colunms))
print(json.dumps (json_objects_list))
def comma_list(string):
'''Convert a comma list '1,2,3,4' to a list
[1,2,3,4] with escaping of , by a one \\ char'''
# Split the string by commas after non-\ chars.
splitted_string = re.split('(?!\\\).,', re.escape(string))
replaced_string = []
# Replace '\,' with ',' and '\\' with '\'.
for string in splitted_string:
string = string.replace ('\\\\', '\\')
string = string.replace ('\\\\,', ',')
replaced_string.append (string)
return replaced_string
if __name__ == '__main__':
import argparse
from sys import stdin
parser = argparse.ArgumentParser()
parser.add_argument('columns_and_names', help='The columns and its names to print out (format: n=name)', default='*')
parser.add_argument('--delim', '-d', type=comma_list,
help='A list of input columns delimiters. Format: delim[:min[-max]]. Where `min` and `max` are the numbers of times `delim` should repeat. As default min=1 and max is not set. Enter "\," for the delimiter "," and "\\\\"" for "\\"',
default=(' ', '\t'),
metavar='delim[:min-max]')
parser.add_argument('infiles', type=argparse.FileType('rb'), default=(stdin,), metavar='file', nargs='*')
main(parser.parse_args())
(See https://github.com/Reflexe/convert_table_to_json for more usage examples)
I tried to look for a similar program, but I could not find anything so I had to write it (I think it is a very useful tool).
For example, to use it with netstat, use this:
$ netstat -a | grep ESTABLISHED | ./convert_to_json.py '2=name,3=other'
Upvotes: 3
Reputation: 70822
As terminal is a very special kind of interface, with monospaced fonts, tools are built to monitor on this terminal, many output could be very difficult to parse:
netstat
output is a good sample:
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ACC ] STREAM LISTENING 13947569 @/tmp/.X11-unix/X1
unix 2 [ ] DGRAM 8760 /run/systemd/notify
unix 2 [ ACC ] SEQPACKET LISTENING 8790 /run/udev/control
Where some line contain blank fields, this could not be simply splitted on spaces.
Because of this, the requestet script convert_to_json
will be posted at very bottom of this.
awk
By using awk
, you could use nice syntax:
netstat -an |
awk '/CLOSE_WAIT/{
printf " { \42%s\42:\42%s\42,\42%s\42:\42%s\42},\n","name",$1,"other",$3
}' |
sed '1s/^/[\n/;$s/,$/\n]/'
perl
, but using json libraryBut this perl way is more flexible:
netstat -an | perl -MJSON::XS -ne 'push @out,{"name"=>,$1,"other"=>$2} if /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;END{print encode_json(\@out)."\n";}'
or same but splitted;
netstat -an |
perl -MJSON::XS -ne '
push @out,{"name"=>,$1,"other"=>$2} if
/^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;
END{print encode_json(\@out)."\n";
}'
Or pretty-printed:
netstat -an | perl -MJSON::XS -ne '
push @out,{"name"=>,$1,"other"=>$2} if /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;
END{$coder = JSON::XS->new->ascii->pretty->allow_nonref;
print $coder->encode(\@out);}'
Finally, I like this version not based on regex:
netstat -an | perl -MJSON::XS -ne '
do {
my @line=split(/\s+/);
push @out,{"name"=>,$line[0],"other"=>$line[2]}
} if /CLOSE_WAIT/;
END{
$coder = JSON::XS->new->ascii->pretty->allow_nonref;
print $coder->encode(\@out);
}'
But you could run command inside perl script:
perl -MJSON::XS -e '
open STDIN,"netstat -an|";
my @out;
while (<>){
push @out,{"name"=>,$1,"other"=>$2} if /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;
};
print encode_json \@out;'
This could become a basical prototyp:
#!/usr/bin/perl -w
use strict;
use JSON::XS;
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;
$ENV{'LANG'}='C';
open STDIN,"netstat -naut|";
my @out;
my @fields;
my $searchre=":";
$searchre = shift @ARGV if @ARGV;
while (<>){
map { s/_/ /g;push @fields,$_; } split(/\s+/) if
/^Proto.*State/ && s/\sAddr/_Addr/g;
do {
my @line=split(/\s+/);
my %entry;
for my $i (0..$#fields) {
$entry{$fields[$i]}=$line[$i];
};
push @out,\%entry;
} if /$searchre/;
}
print $coder->encode(\@out);
(Without argument, this will dump entire netstat -uta
, but you could give any search string as argument, like CLOSE or an IP.)
netstat2json.pl
This method could work with many other tools than netcat
, with some
corrections:
#!/usr/bin/perl -w
use strict;
use JSON::XS;
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;
$ENV{'LANG'}='C';
open STDIN,"netstat -nap|";
my ( $searchre ,@out,%fields)=( "[/:]" );
$searchre = shift @ARGV if @ARGV;
while (<>){
next if /^Active\s.*\)$/;
/^Proto.*State/ && do {
s/\s(name|Addr)/_$1/g;
my @head;
map { s/_/ /g;push @head,$_; } split(/\s+/);
s/_/ /g;
%fields=();
for my $i (0..$#head) {
my $crt=index($_,$head[$i]);
my $next=-1;
$next=index($_,$head[$i+1])-$crt-1 if $i < $#head;
$fields{$head[$i]}=[$crt,$next];
}
next;
};
do {
my $line=$_;
my %entry;
for my $i (keys %fields) {
my $crt=substr($line,$fields{$i}[0],$fields{$i}[1]);
$crt=~s/^\s*(\S(|.*\S))\s*$/$1/;
$entry{$i}=$crt;
};
push @out,\%entry;
} if /$searchre/;
}
print $coder->encode(\@out);
Proto.*State
(specific to netcat
)This could be run with arguments, like previously:
./netstat2json.pl CLOS
[
{
"Local Address" : "127.0.0.1:31001",
"State" : "CLOSE_WAIT",
"Recv-Q" : "18",
"Proto" : "tcp",
"Send-Q" : "0",
"Foreign Address" : "127.0.0.1:55938",
"PID/Program name" : "-"
},
{
"Recv-Q" : "1",
"Local Address" : "::1:53816",
"State" : "CLOSE_WAIT",
"Send-Q" : "0",
"PID/Program name" : "-",
"Foreign Address" : "::1:631",
"Proto" : "tcp6"
}
]
And empty fields don't break variable assignement:
./netstat2json.pl 1000.*systemd/notify
[
{
"Proto" : "unix",
"I-Node" : "33378",
"RefCnt" : "2",
"Path" : "/run/user/1000/systemd/notify",
"PID/Program name" : "-",
"Type" : "DGRAM",
"Flags" : "[ ]",
"State" : ""
}
]
Nota! This modified version run netstat
with -nap
arguments to get PID/Program name
field.
If not run by superuser root
, you could become this output on STDERR
:
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
You could avoid them
netstat2json.pl 2>/dev/null
,root
or with sudo
or#6
, change "netstat -nap|"
for "netstat -na|"
.convert_to_json.pl
perl script to transform STDIN to json.There is the convert_to_json.pl
perl script, strictly as requested: to be run as netstat -an | grep CLOSE | ./convert_to_json.pl 1,3 name,other
#!/usr/bin/perl -w
use strict;
use JSON::XS;
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;
my (@fields,@pos,@out);
map {
push @pos,1*$_-1
} split ",",shift @ARGV;
map {
push @fields,$_
} split ",",shift @ARGV;
die "Number of fields don't match number of positions" if $#fields ne $#pos;
while (<>) {
my @line=split(/\s+/);
my %entry;
for my $i (0..$#fields) {
$entry{$fields[$i]}=$line[$pos[$i]];
};
push @out,\%entry;
}
print $coder->encode(\@out);
Upvotes: 6
Reputation: 3172
Probably, BASH is not the best platform for it. However, I provide a half-baked solution with some trick you will need.
#!/bin/bash
function procline {
IFS=' ' list=($1)
echo -n "{ first_column: \"" ${list[0]} "\","
echo "{ second_column: \"" ${list[1]} "\" }},"
}
tr -s " " | eval \
'while IFS= read -r line; do procline "$line"; done'
Some explanation:
tr
truncates the spaceswhile IFS
... passes the result line-by-line to procline
functionprocline
function first splits the line, then creates kinda JSON. My opinion is, that the output of the tr
should be passed to another script written in other language, e.g. Python, PHP-CLI etc., if not the whole. It looks pretty easy-to-process:
tcp4 0 0 192.168.99.1.56358 192.168.99.100.32111 CLOSE_WAIT
tcp4 31 0 192.168.100.179.56129 server-54-192-20.https CLOSE_WAIT
Upvotes: 0
Reputation: 207475
Just a really basic concept for an approach that is not at all rigorous or fully-featured but which might give you an idea how to do the bulk of this with awk
, since you don't seem to have much yet!
netstat -a | grep CLOSE_WAIT | awk 'BEGIN{print "["} {print " {\"name\": \"",$1,"\", \"other\": \"",$2,"\"}"} END{print "]"}' OFS=""
[
{"name": "tcp4", "other": "31"}
{"name": "tcp4", "other": "31"}
{"name": "tcp4", "other": "31"}
{"name": "tcp4", "other": "31"}
{"name": "tcp4", "other": "0"}
{"name": "tcp4", "other": "31"}
]
I know it doesn't do the commas at the line-ends, and I know it doesn't take parameters - but both of those are solvable.
One idea for parameters would be to pass them to awk
like this:
awk -v fields="1:4:7" -v headings="name:other:fred" '{...}'
then split()
those in the BEGIN
section and iterate through them in the main loop. That would look something like this:
echo hi | awk -v fields="1:3:5" -v headings="HeadA:HeadB:HeadC" 'BEGIN{split(headings,h,":"); split(fields,f,":")} {for(i in h)print h[i],f[i];}'
HeadA 1
HeadB 3
HeadC 5
Upvotes: 3