Brad Parks
Brad Parks

Reputation: 72001

Convert arbitrary output to json by column in the terminal?

I'd like to be able to pipe the output from any command line program to a command that converts it to json.

For example my unknown program could accept target columns, a delimiter and output field names

# select columns 1 and 3 from the output and convert it to simple json
netstat -a | grep CLOSE_WAIT | convert_to_json 1,3 name,other

and would generate something like so:

[ 
  {"name": "tcp4", "other": "31"},
  {"name": "tcp4", "other": "0"} 
...
]

I'm looking for something that works for any program, not just netstat!

I'm open to installing any 3rd party tool/opensource project, and tend to run things on linux/osx - does not have to be a bash script solution, can be written in node, perl, python, etc.

EDIT: I'm of course willing to pass in any more info that'd be required to make it work, for example a delimiter or multiple delimiters - I'd just like to avoid explicit parsing in the command line, and have the tool do that.

Upvotes: 4

Views: 8937

Answers (6)

Brad Parks
Brad Parks

Reputation: 72001

I found a great list of tools to work with command line output, and one of the tools listed is sqawk which will convert any arbitrary data to json, and let you filter it using sql like queries!

Convert ps output to JSON

ps | sqawk -output json,indent=1 'select PID,TTY,TIME,CMD from a' trim=left header=1

Output

[{
    "PID"  : "3947",
    "TTY"  : "pts/2",
    "TIME" : "00:00:07",
    "CMD"  : "zsh"
},{
    "PID"  : "15951",
    "TTY"  : "pts/2",
    "TIME" : "00:00:00",
    "CMD"
}]

Upvotes: 4

Eric Duminil
Eric Duminil

Reputation: 54233

Here's my ruby version :

#! /usr/bin/env ruby
#
# Converts stdin columns to a JSON array of hashes
#
# Installation : Save as convert_to_json, make it executable and put it somewhere in PATH. Ruby must be installed
#
# Examples :
#
# netstat -a | grep CLOSE_WAIT | convert_to_json 1,3 name,other
# ls -l | convert_to_json
# ls -l | convert_to_json 6,7,8,9
# ls -l | convert_to_json 6,7,8,9 month,day,time,name
# convert_to_json 1,2 time,value ";" < some_file.csv
#
#
# http://stackoverflow.com/questions/40246134/convert-arbitrary-output-to-json-by-column-in-the-terminal

require 'json'

script_name = File.basename(__FILE__)
syntax = "Syntax : command_which_outputs_columns | #{script_name} column1_id,column2_id,...,columnN_id column1_name,column2_name,...,columnN_name delimiter"


if $stdin.tty? or $stdin.closed? then
  $stderr.puts syntax
else
  if ARGV[2]
    delimiter = ARGV[2]
    $stderr.puts "#{script_name} : Using #{delimiter} as delimiter"
  else
    delimiter = /\s+/
  end

  column_ids = (ARGV[0] || "").split(',').map{|column_id| column_id.to_i-1}
  column_names = (ARGV[1] || "").split(',')

  results = []
  $stdin.each do |stdin_line|
    if column_ids.empty?
      values = stdin_line.strip.split(delimiter)
    else
      values = stdin_line.strip.split(delimiter).values_at(*column_ids)
    end
    line_hash=Hash.new
    values.each_with_index.each{|value,i|
      colum_name = column_names[i] || "column#{(column_ids[i] || i)+1}"
      line_hash[colum_name]=value
    }
    results<<line_hash
  end
  puts JSON.pretty_generate(results)
end

It works as defined in your example :

netstat -a | grep CLOSE_WAIT | convert_to_json 1,3 name,other
[
  {
    "name": "tcp",
    "other": "0"
  },
  {
    "name": "tcp6",
    "other": "0"
  }
]

As a bonus, you can

  • omit to specify parameters : every column will be converted to json
  • omit to specify names : column will be called column1, column2, ...
  • choose a missing column : value will be null
  • define a delimiter as third parameter. Default is whitespace

Other examples :

netstat -a | grep CLOSE_WAIT | ./convert_to_json
# [
#   {
#     "column1": "tcp",
#     "column2": "1",
#     "column3": "0",
#     "column4": "10.0.2.15:51074",
#     "column5": "123.45.101.207:https",
#     "column6": "CLOSE_WAIT"
#   },
#   {
#     "column1": "tcp6",
#     "column2": "1",
#     "column3": "0",
#     "column4": "ip6-localhost:50293",
#     "column5": "ip6-localhost:ipp",
#     "column6": "CLOSE_WAIT"
#   }
# ]

netstat -a | grep CLOSE_WAIT | ./convert_to_json 1,3
# [
#   {
#     "column1": "tcp",
#     "column3": "0"
#   },
#   {
#     "column1": "tcp6",
#     "column3": "0"
#   }
# ]

ls -l | tail -n3 | convert_to_json 6,7,8,9 month,day,time,name
# [
#   {
#     "month": "Oct",
#     "day": "27",
#     "time": "10:35",
#     "name": "test.dot"
#   },
#   {
#     "month": "Nov",
#     "day": "2",
#     "time": "14:27",
#     "name": "uniq.rb"
#   },
#   {
#     "month": "Nov",
#     "day": "2",
#     "time": "14:27",
#     "name": "utf8_nokogiri.rb"
#   }
# ]

# NOTE: ls -l uses the 8th column for year, not time, for older files :
ls --full-time -t /usr/share/doc | tail -n3 | ./convert_to_json 6,7,9 yyyymmdd,time,name
[
  {
    "yyyymmdd": "2013-10-21",
    "time": "15:15:20.000000000",
    "name": "libbz2-dev"
  },
  {
    "yyyymmdd": "2013-10-10",
    "time": "16:27:32.000000000",
    "name": "zsh"
  },
  {
    "yyyymmdd": "2013-10-03",
    "time": "18:52:45.000000000",
    "name": "manpages-dev"
  }
]

ls -l | tail -n3 | convert_to_json 9,12
# [
#   {
#     "column9": "test.dot",
#     "column12": null
#   },
#   {
#     "column9": "uniq.rb",
#     "column12": null
#   },
#   {
#     "column9": "utf8_nokogiri.rb",
#     "column12": null
#   }
# ]

convert_to_json 1,2 time,value ";" < some_file.csv
# convert_to_json : Using ; as delimiter
# [
#   {
#     "time": "1",
#     "value": "3"
#   },
#   {
#     "time": "2",
#     "value": "5"
#   }
# ]

Upvotes: 5

Shmuel H.
Shmuel H.

Reputation: 2546

Here it is my python version:

#!/usr/bin/env python3

import json
import re

def all_columns_to_json (column_dict, columns_line):    
    json_object = {}    

    for column_index, column_value in enumerate(columns_line):
        if column_index in column_dict:
            column_name = column_dict[column_index]
        else:
            column_name = str(column_index)

        json_object[column_name] = column_value

    return json_object


def filter_columns_in_dict_to_json(column_dict, columns_line):
    '''Parse columns_line, make sure every element in column_dict
       exists there, filter elements that are not in column_dict from 
       columns_line, and convert it to a dict.
    '''
    json_object = {}    

    for column_index, column_name in column_dict.items():
        try:
            json_object[column_name] = columns_line[column_index]
        except IndexError as err:
            # columns_line doesn't has column_index.

            raise ValueError('Invalid table line ({}) : no {} element.'.format(columns_line,
                                                                               column_index)) from err     

    return json_object

def columns_line_to_json (column_dict, columns_line, should_filter_colunms):
    '''Parse a list of values to a json object with special names.
    '''

    if should_filter_colunms:
        return filter_columns_in_dict_to_json(column_dict, columns_line)
    else:
        return all_columns_to_json(column_dict, columns_line)

def regex_from_delims_list(delims_list):
    '''Get a regex compiled pattern from a delims list'''    

    one_characters_delims = ''
    final_pattern = ''

    for delim in delims_list:
        delim_and_maybe_min_max = delim.split(':')

        escaped_delim = re.escape(delim_and_maybe_min_max[0])

        # Check if this is a delim without min count.
        if len(delim) == 1:
            final_pattern += "%s{1,}|" % (escaped_delim)
        elif len(delim) == 2:
            min_and_maybe_max = delim_and_maybe_min_max[1].split('-')

            current_pattern = escaped_delim

            # Add count to the regex (only min or max too)
            if len(min_and_maybe_max) == 2:
                current_pattern += '{%d,%d}' % (int(min_and_maybe_max[0],
                                                int(min_and_maybe_max[1])))
            else:
                current_pattern += '{%d,}' % (int(min_and_maybe_max[0]))

            final_pattern += current_pattern + '|'
        else:
            raise ValueError("Invalid ':' count in the delimiter argument")

        # If there are one character delims without count, add them. If not
        # Remove the last OR ('|').

        final_pattern = final_pattern[:-1]

        return re.compile (final_pattern)


def main(args):
    column_dict = {}    

    # Split the user's argument by a comma, and parse each columns
    # seperatly.
    for column_and_name in args.columns_and_names.split(','):
        # Split the name from the columns.
        column_and_name = column_and_name.split('=')
        if len(column_and_name) > 2:
            raise ValueError("Invalid column: {}".format(str(column_and_name())))

        # If there is not name, set it to the column index.
        if len(column_and_name) == 1:
            column_and_name.append (str(column_and_name[0]))

        # Try to convert the column index is it isn't '*'
        if column_and_name[0] != '*':
            try:
                column_and_name[0] = int(column_and_name[0])
            except ValueError as err:
                raise ValueError('Invalid column index: {} (not an integer)'.format(column_and_name[0])) from err

        # Add this column definition. 
        column_dict[column_and_name[0]] = column_and_name[1]


    # Check if column_dict has the '*' member.
    # If it does, we will print all of the columns (even ones that
    # are not in column_dict)
    should_filter_colunms = ('*' not in column_dict)

    # We have checked it, no need for it now.
    if not should_filter_colunms:
        del column_dict['*']

    # Parse the delim list into a regex pattern.
    strip_regex_pattern = regex_from_delims_list(args.delimiters)

    json_objects_list = []    

    for fd in args.infiles:
        for line in fd:
            # Convert bytes object to string.
            if isinstance(line, bytes): 
                line = line.decode('utf-8')

            # Strip the \n in the end of the line.
            line = line.rstrip('\n')            

            # Split the line by the delims.
            splitted_line = re.split(strip_regex_pattern, line)

            json_objects_list.append (columns_line_to_json (column_dict, splitted_line, should_filter_colunms))

    print(json.dumps (json_objects_list))


def comma_list(string):
    '''Convert a comma list '1,2,3,4' to a list
    [1,2,3,4] with escaping of , by a one \\ char'''

    # Split the string by commas after non-\ chars.
    splitted_string = re.split('(?!\\\).,', re.escape(string))

    replaced_string = []    

    # Replace '\,' with ',' and '\\' with '\'.
    for string in splitted_string:
        string = string.replace ('\\\\', '\\')
        string = string.replace ('\\\\,', ',')

        replaced_string.append (string)    

    return replaced_string

if __name__ == '__main__':
    import argparse    
    from sys import stdin

    parser = argparse.ArgumentParser()
    parser.add_argument('columns_and_names', help='The columns and its names to print out (format: n=name)', default='*')
    parser.add_argument('--delim', '-d', type=comma_list, 
                        help='A list of input columns delimiters. Format: delim[:min[-max]]. Where `min` and `max` are the numbers of times `delim` should repeat. As default min=1 and max is not set. Enter "\," for the delimiter "," and "\\\\"" for "\\"',
                        default=(' ', '\t'), 
                        metavar='delim[:min-max]')
    parser.add_argument('infiles', type=argparse.FileType('rb'), default=(stdin,), metavar='file', nargs='*')

    main(parser.parse_args())

(See https://github.com/Reflexe/convert_table_to_json for more usage examples)

I tried to look for a similar program, but I could not find anything so I had to write it (I think it is a very useful tool).

For example, to use it with netstat, use this:

$ netstat -a | grep ESTABLISHED |  ./convert_to_json.py  '2=name,3=other'

Upvotes: 3

F. Hauri  - Give Up GitHub
F. Hauri - Give Up GitHub

Reputation: 70822

Filtering STDIN to build json variable

Introduction

As terminal is a very special kind of interface, with monospaced fonts, tools are built to monitor on this terminal, many output could be very difficult to parse:

netstat output is a good sample:

Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node   Path
unix  2      [ ACC ]     STREAM     LISTENING     13947569 @/tmp/.X11-unix/X1
unix  2      [ ]         DGRAM                    8760     /run/systemd/notify
unix  2      [ ACC ]     SEQPACKET  LISTENING     8790     /run/udev/control

Where some line contain blank fields, this could not be simply splitted on spaces.

Because of this, the requestet script convert_to_json will be posted at very bottom of this.

Simple space based splitting with awk

By using awk, you could use nice syntax:

netstat -an |
    awk '/CLOSE_WAIT/{
        printf "  { \42%s\42:\42%s\42,\42%s\42:\42%s\42},\n","name",$1,"other",$3
    }' |
    sed '1s/^/[\n/;$s/,$/\n]/'

Simple space based splitting with perl, but using json library

But this way is more flexible:

netstat -an | perl -MJSON::XS -ne 'push @out,{"name"=>,$1,"other"=>$2} if /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;END{print encode_json(\@out)."\n";}'

or same but splitted;

netstat -an |
    perl -MJSON::XS -ne '
        push @out,{"name"=>,$1,"other"=>$2} if
                /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;
        END{print encode_json(\@out)."\n";
}'

Or pretty-printed:

netstat -an | perl -MJSON::XS -ne '
    push @out,{"name"=>,$1,"other"=>$2} if /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;
    END{$coder = JSON::XS->new->ascii->pretty->allow_nonref;
        print $coder->encode(\@out);}'

Finally, I like this version not based on :

netstat -an | perl -MJSON::XS -ne '
    do {
        my @line=split(/\s+/);
        push @out,{"name"=>,$line[0],"other"=>$line[2]}
    } if /CLOSE_WAIT/;
    END{
        $coder = JSON::XS->new->ascii->pretty->allow_nonref;
        print $coder->encode(\@out);
    }'

But you could run command inside perl script:

perl -MJSON::XS -e '
    open STDIN,"netstat -an|";
    my @out;
    while (<>){
        push @out,{"name"=>,$1,"other"=>$2} if /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;
    };
    print encode_json \@out;'

This could become a basical prototyp:

#!/usr/bin/perl -w

use strict;
use JSON::XS;
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;

$ENV{'LANG'}='C';
open STDIN,"netstat -naut|";
my @out;
my @fields;

my $searchre=":";
$searchre = shift @ARGV if @ARGV;

while (<>){
    map { s/_/ /g;push @fields,$_; } split(/\s+/) if
        /^Proto.*State/ && s/\sAddr/_Addr/g;
    do {
        my @line=split(/\s+/);
        my %entry;
        for my $i (0..$#fields) {
            $entry{$fields[$i]}=$line[$i];
        };
        push @out,\%entry;
    } if /$searchre/;
}

print $coder->encode(\@out);

(Without argument, this will dump entire netstat -uta, but you could give any search string as argument, like CLOSE or an IP.)

Positional parameters, netstat2json.pl

This method could work with many other tools than netcat, with some corrections:

#!/usr/bin/perl -w
use strict;
use JSON::XS;
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;
$ENV{'LANG'}='C';
open STDIN,"netstat -nap|";
my ( $searchre ,@out,%fields)=( "[/:]" );
$searchre = shift @ARGV if @ARGV;
while (<>){
    next if /^Active\s.*\)$/;
    /^Proto.*State/ && do {
        s/\s(name|Addr)/_$1/g;
        my @head;
        map { s/_/ /g;push @head,$_; } split(/\s+/);
        s/_/ /g;
        %fields=();
        for my $i (0..$#head) {
            my $crt=index($_,$head[$i]);
            my $next=-1;
            $next=index($_,$head[$i+1])-$crt-1 if $i < $#head;
            $fields{$head[$i]}=[$crt,$next];
        }
        next;
    };
    do {
        my $line=$_;
        my %entry;
        for my $i (keys %fields) {
            my $crt=substr($line,$fields{$i}[0],$fields{$i}[1]);
            $crt=~s/^\s*(\S(|.*\S))\s*$/$1/;
            $entry{$i}=$crt;
        };
        push @out,\%entry;
    } if /$searchre/;
}
print $coder->encode(\@out);
  • find header lines Proto.*State (specific to netcat)
  • store fieldnames with position and length
  • split line by field length,then trim spaces
  • dump variable as json string.

This could be run with arguments, like previously:

./netstat2json.pl CLOS
[
   {
      "Local Address" : "127.0.0.1:31001",
      "State" : "CLOSE_WAIT",
      "Recv-Q" : "18",
      "Proto" : "tcp",
      "Send-Q" : "0",
      "Foreign Address" : "127.0.0.1:55938",
      "PID/Program name" : "-"
   },
   {
      "Recv-Q" : "1",
      "Local Address" : "::1:53816",
      "State" : "CLOSE_WAIT",
      "Send-Q" : "0",
      "PID/Program name" : "-",
      "Foreign Address" : "::1:631",
      "Proto" : "tcp6"
   }
]

And empty fields don't break variable assignement:

./netstat2json.pl 1000.*systemd/notify
[
   {
      "Proto" : "unix",
      "I-Node" : "33378",
      "RefCnt" : "2",
      "Path" : "/run/user/1000/systemd/notify",
      "PID/Program name" : "-",
      "Type" : "DGRAM",
      "Flags" : "[ ]",
      "State" : ""
   }
]

Nota! This modified version run netstat with -nap arguments to get PID/Program name field.

If not run by superuser root, you could become this output on STDERR:

(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)

You could avoid them

  • by running netstat2json.pl 2>/dev/null,
  • by running this as root or with sudo or
  • edit line #6, change "netstat -nap|" for "netstat -na|".

convert_to_json.pl script to transform STDIN to json.

There is the convert_to_json.pl perl script, strictly as requested: to be run as netstat -an | grep CLOSE | ./convert_to_json.pl 1,3 name,other

#!/usr/bin/perl -w

use strict;
use JSON::XS;
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;

my (@fields,@pos,@out);

map {
    push @pos,1*$_-1
} split ",",shift @ARGV;      

map { 
    push @fields,$_
} split ",",shift @ARGV;

die "Number of fields don't match number of positions" if $#fields ne $#pos;

while (<>) {
    my @line=split(/\s+/);
    my %entry;
    for my $i (0..$#fields) {
         $entry{$fields[$i]}=$line[$pos[$i]];
    };
    push @out,\%entry;
}
print $coder->encode(\@out);

Upvotes: 6

ern0
ern0

Reputation: 3172

Probably, BASH is not the best platform for it. However, I provide a half-baked solution with some trick you will need.

#!/bin/bash

function procline {
    IFS=' ' list=($1)
    echo -n "{ first_column: \"" ${list[0]} "\","
    echo "{ second_column: \"" ${list[1]} "\" }},"
}

tr -s " " | eval \
    'while IFS= read -r line; do procline "$line"; done'

Some explanation:

  • tr truncates the spaces
  • while IFS... passes the result line-by-line to procline function
  • procline function first splits the line, then creates kinda JSON.

My opinion is, that the output of the tr should be passed to another script written in other language, e.g. Python, PHP-CLI etc., if not the whole. It looks pretty easy-to-process:

tcp4 0 0 192.168.99.1.56358 192.168.99.100.32111 CLOSE_WAIT
tcp4 31 0 192.168.100.179.56129 server-54-192-20.https CLOSE_WAIT

Upvotes: 0

Mark Setchell
Mark Setchell

Reputation: 207475

Just a really basic concept for an approach that is not at all rigorous or fully-featured but which might give you an idea how to do the bulk of this with awk, since you don't seem to have much yet!

netstat -a | grep CLOSE_WAIT | awk 'BEGIN{print "["} {print "  {\"name\": \"",$1,"\", \"other\": \"",$2,"\"}"} END{print "]"}' OFS=""
[
  {"name": "tcp4", "other": "31"}
  {"name": "tcp4", "other": "31"}
  {"name": "tcp4", "other": "31"}
  {"name": "tcp4", "other": "31"}
  {"name": "tcp4", "other": "0"}
  {"name": "tcp4", "other": "31"}
]

I know it doesn't do the commas at the line-ends, and I know it doesn't take parameters - but both of those are solvable.

One idea for parameters would be to pass them to awk like this:

awk -v fields="1:4:7" -v headings="name:other:fred" '{...}'

then split() those in the BEGIN section and iterate through them in the main loop. That would look something like this:

echo hi | awk -v fields="1:3:5" -v headings="HeadA:HeadB:HeadC" 'BEGIN{split(headings,h,":"); split(fields,f,":")} {for(i in h)print h[i],f[i];}'

HeadA 1
HeadB 3
HeadC 5

Upvotes: 3

Related Questions