Reputation: 2115

Detect increment made in any column

I have following data as input. I am trying to find the increment per group.

col1    col2    col3    group
1        2      100  alpha
1        2      100  alpha
1        2      100  alpha
3        4      200  beta
3        4      200  beta
3        4      200  beta
3        4      300  beta
5        6      700  charlie
7        8      400  tango
7        8      300  tango
7        8      700  tango

Example output:

tango: 300
charlie:0
beta:100
alpha:0

I am trying this approch but answers are incorrect as sometimes values increases in between the samples:

awk 'NR>1{print $NF}' foo |while read line;do grep -w $line foo|sort -k3n ;done |awk '!a[$4]++' |sort -k4
1       2       100     alpha
3       4       200     beta
5       6       700     charlie
7       8       300     tango
awk 'NR>1{print $NF}' foo |while read line;do grep -w $line foo|sort -k3n ;done |tac|awk '!a[$4]++' |sort -k4
1       2       100     alpha
3       4       300     beta
5       6       700     charlie
7       8       700     tango

Upvotes: 2

Answers (4)

Ed Morton

Reputation: 203645

$ cat tst.awk
NR==1 { next }
!($4 in beg) { beg[$4] = $3 }
{ end[$4] = $3 }
END {
    for (grp in beg) {
        print grp, end[grp] - beg[grp]
    }
}

$ awk -f tst.awk file
tango 300
alpha 0
beta 100
charlie 0

Upvotes: 0

RavinderSingh13

Reputation: 133528

Following awk may help you in same too. It will provide output in same sequence as per your Input_file's last column values.

awk '
FNR==1{
  next}
prev!=$NF && prev{
  val=prev_val!=a[prev]?prev_val-a[prev]:0;
  printf("%s %d\n",prev,val>0?val:0)}
!a[$NF]{
  a[$NF]=$(NF-1)}
{
  prev=$NF;
  prev_val=$(NF-1)}
END{
  val=prev_val!=a[prev]?prev_val-a[prev]:0;
  printf("%s %d\n",prev,val>0?val:0)}
'   Input_file

Output will be as follows. Will add explanation too shortly.

alpha 0
beta 100
charlie 0
tango 300

Explanation: Adding explanation of code too now for learning purposes of all.

awk '
FNR==1{                                    ##To skip first line of Input_file which is heading I am putting condition if FNR==1 then do next, where next will skip all further statements of awk.
  next}
prev!=$NF && prev{                         ##Checking conditions here if variable prev value is NOT equal to current line $NF and variable prev is NOT NULL then do following:
  val=prev_val!=a[prev]?prev_val-a[prev]:0;##create a variable val, if prev_val is not equal to a[prev] then subttract prev_val and s[prev] else it will be zero.
  printf("%s %d\n",prev,val>0?val:0)}      ##printing the value of variable prev(which is nothing but value of last column) and then print value of val if greater than 0 or print 0 in place of val here.
!a[$NF]{                                   ##Checking if array a value whose index is $NF is NULL then fill it with current $NF value, actually this is to get the very first value of any column so that later we could subtract it with the its last value as per OP request.
  a[$NF]=$(NF-1)}
{
  prev=$NF;                                ##creating variable named prev and assigning its value to last column of the current line.
  prev_val=$(NF-1)}                        ##creating variable named prev_val whose value will be second last columns value of current line.
END{                                       ##starting end block of awk code here, it will come when Input_file is done with reading.
  val=prev_val!=a[prev]?prev_val-a[prev]:0;##getting value of variable val where checking if prev_val is not equal to a[prev] then subtract prev_val and s[prev] else it will be zero.
  printf("%s %d\n",prev,val>0?val:0)}      ##printing the value of variable prev(which is nothing but value of last column) and then print value of val if greater than 0 or print 0 in place of val here.
' Input_file                               ##Mentioning the Input_file name here.

Upvotes: 1

kvantour

Reputation: 26481

The following should do the trick, this solution does not require the file to be sorted by group name.

awk '(NR==1){next}
     {groupc[$4]++}
     (groupc[$4]==1){groupv[$4]=$3}
     {groupl[$4]=$3}
     END{for(i in groupc) { print i":",groupl[i]-groupv[i]} }
    ' foo

The following things happen :

skip the first line (NR==1){next}
count how many time group is occuring {groupc[$4]++}
if the group count equal 1 define its first value under groupv
define the last seen value as groupl
at the END, run over all array keys (which are the groups), and print the last minus the first value.

output :

tango: 300
alpha: 0
beta: 100
charlie: 0

Upvotes: 2

RomanPerekhrest

Reputation: 92854

Awk solution:

awk 'NR==1{ next }
     g && $4 != g{ print g":"(v - gr[g]) }
     !($4 in gr){ gr[$4]=$3 }{ g=$4; v=$3 }
     END{ print g":"(v - gr[g]) }' file

NR==1{ next } - skip the 1st record
g - variable aimed to hold group name
v - variable aimed to hold group value
!($4 in gr){ gr[$4]=$3 } - on the 1st occurrence of a distinct group name $4 - save its first value $3 into array gr
g && $4 != g{ print g":"(v - gr[g]) } - if the current group name $4 differs from the previous one g - print the delta between the last and 1st values of the previous group

The output:

alpha:0
beta:100
charlie:0
tango:300

Upvotes: 3

Detect increment made in any column

Answers (4)

Related Questions