Command-line tool to group textual data according to a group key

Question

I'd like to find a text-processing utility that would group all the values of an attribute for same primary key. Environment is Linux.

Consider a text file which consist "records", each record being a line in the file. Those records are space-separated sequence of numerical values, one of them being the primary key value, and others being either an additional property of a primary key or attributes computed for this primary key. Example:

pkey pkey-prop1 pkey-prop2 attr1 attr2 attr3 attr4
100 200 400 0.1 0.2 0.3 0.4
100 200 400 0.2 0.7 0.4 0.5
100 200 400 0.3 0.4 0.5 0.6
101 200 401 0.7 0.8 0.9 1.0
101 200 401 0.8 0.9 1.0 1.1
101 200 401 0.9 1.7 1.1 1.2

By specifying which column plays role of pkey, property and attribute, I'd like to obtain the grouping of a certain attribute from all the records that belong to the same primary key. Example, for pkey=$1, property=$2 $3, attribute=$5, the result would be:

100 200 400 0.2 0.7 0.4
101 200 401 0.8 0.9 1.7

That is, from all lines with pkey=100 attributes are grouped into a single line, from all lines with pkey=101 they are also grouped into another line.

I'm not expecting to have an exact tool, but I'd be very happy to have a tool that does at least grouping.

kev · Accepted Answer

awk '
x==$1 && y==$2 && z==$3 {
    printf(" %s", $5)
    next
}

{
    x=$1
    y=$2
    z=$3
    printf("%s%s %s %s %s", NR==1?"":"
", x,y,z,$5)
}

END{
    print ""
}' input.txt

100 200 400 0.2 0.7 0.4
101 200 401 0.8 0.9 1.7

Command-line tool to group textual data according to a group key

Answers (1)

Related Questions