Column Operations in file Linux Shell

Question

I have a file with entries seperated by an empty space. For example:

example.txt

24676 256 218503341 2173
13236272 500 1023073758 5089
2230304 96 15622969 705
0 22 0 526
13277 28 379182 141

I would like to print, in the command line, the outcome of "column 1/ column 3" or simila. I believe it can be done with awk. However, some entries are 0, hence division by 0 gives:

fatal: division by zero attempted

In a more advanced case, I would like to find the median value (or some percentile) of the division.

Jonathan Leffler · Accepted Answer

There are many ways to ignore the row with a zero divisor, including:

awk '$3 != 0 { print $1/$3 }' your-data-file

awk '{ if ($3 != 0) print $1/$3 }' your-data-file

The question changed — to print 0 instead. The answer is not much harder:

awk '{ if ($3 != 0) print $1/$3; else print 0 }' your-data-file

Medians and other percentiles are much fiddlier to deal with. It's easiest if the data is in sorted order. So much easier that I'd expect to use a numeric sort and then process the data from there.

I dug out an old shell script which computes descriptive statistics - min, max, mode, median, and deciles of a single numeric column of data:

:   "@(#)$Id: dstats.sh,v 1.2 1997/06/02 21:45:00 johnl Exp $"
#
#   Calculate Descriptive Statistics: min, max, median, mode, deciles

sort -n $* |
awk 'BEGIN { max = -999999999; min = 999999999; }
    {   # Accumulate basic data
        count[$1]++;
        item[++n] = $1;
        if ($1 > max) max = $1;
        if ($1 < min) min = $1;
    }
END {   # Print Descriptive Statistics
        printf("# Count = %d
", n);
        printf("# Min = %d
", min);
        decile = 1;
        for (decile = 10; decile < 100; decile += 10)
        {
            idx = int((decile * n) / 100) + 1;
            printf("# %d%% decile = %d
", decile, item[idx]);
            if (decile == 50)
                median = item[idx];
        }
        printf("# Max = %d
", max);

        printf("# Median = %d
", median);
        for (i in count)
        {
            if (count[i] > count[mode])
                mode = i;
        }
        printf("# Mode = %d
", mode);
    }'

The initial values of min and max are not exactly scientific. It serves to illustrate a point.

(This 1997 version is almost identical to its 1991 predecessor - all except for the version information line is identical, in fact. So, the code is over 20 years old.)

Column Operations in file Linux Shell

Answers (2)

Related Questions