Reputation: 2115
I have following data as input. I am trying to find the increment per group.
col1 col2 col3 group
1 2 100 alpha
1 2 100 alpha
1 2 100 alpha
3 4 200 beta
3 4 200 beta
3 4 200 beta
3 4 300 beta
5 6 700 charlie
7 8 400 tango
7 8 300 tango
7 8 700 tango
Example output:
tango: 300
charlie:0
beta:100
alpha:0
I am trying this approch but answers are incorrect as sometimes values increases in between the samples:
awk 'NR>1{print $NF}' foo |while read line;do grep -w $line foo|sort -k3n ;done |awk '!a[$4]++' |sort -k4
1 2 100 alpha
3 4 200 beta
5 6 700 charlie
7 8 300 tango
awk 'NR>1{print $NF}' foo |while read line;do grep -w $line foo|sort -k3n ;done |tac|awk '!a[$4]++' |sort -k4
1 2 100 alpha
3 4 300 beta
5 6 700 charlie
7 8 700 tango
Upvotes: 2
Views: 125
Reputation: 203645
$ cat tst.awk
NR==1 { next }
!($4 in beg) { beg[$4] = $3 }
{ end[$4] = $3 }
END {
for (grp in beg) {
print grp, end[grp] - beg[grp]
}
}
$ awk -f tst.awk file
tango 300
alpha 0
beta 100
charlie 0
Upvotes: 0
Reputation: 133528
Following awk
may help you in same too. It will provide output in same sequence as per your Input_file's last column values.
awk '
FNR==1{
next}
prev!=$NF && prev{
val=prev_val!=a[prev]?prev_val-a[prev]:0;
printf("%s %d\n",prev,val>0?val:0)}
!a[$NF]{
a[$NF]=$(NF-1)}
{
prev=$NF;
prev_val=$(NF-1)}
END{
val=prev_val!=a[prev]?prev_val-a[prev]:0;
printf("%s %d\n",prev,val>0?val:0)}
' Input_file
Output will be as follows. Will add explanation too shortly.
alpha 0
beta 100
charlie 0
tango 300
Explanation: Adding explanation of code too now for learning purposes of all.
awk '
FNR==1{ ##To skip first line of Input_file which is heading I am putting condition if FNR==1 then do next, where next will skip all further statements of awk.
next}
prev!=$NF && prev{ ##Checking conditions here if variable prev value is NOT equal to current line $NF and variable prev is NOT NULL then do following:
val=prev_val!=a[prev]?prev_val-a[prev]:0;##create a variable val, if prev_val is not equal to a[prev] then subttract prev_val and s[prev] else it will be zero.
printf("%s %d\n",prev,val>0?val:0)} ##printing the value of variable prev(which is nothing but value of last column) and then print value of val if greater than 0 or print 0 in place of val here.
!a[$NF]{ ##Checking if array a value whose index is $NF is NULL then fill it with current $NF value, actually this is to get the very first value of any column so that later we could subtract it with the its last value as per OP request.
a[$NF]=$(NF-1)}
{
prev=$NF; ##creating variable named prev and assigning its value to last column of the current line.
prev_val=$(NF-1)} ##creating variable named prev_val whose value will be second last columns value of current line.
END{ ##starting end block of awk code here, it will come when Input_file is done with reading.
val=prev_val!=a[prev]?prev_val-a[prev]:0;##getting value of variable val where checking if prev_val is not equal to a[prev] then subtract prev_val and s[prev] else it will be zero.
printf("%s %d\n",prev,val>0?val:0)} ##printing the value of variable prev(which is nothing but value of last column) and then print value of val if greater than 0 or print 0 in place of val here.
' Input_file ##Mentioning the Input_file name here.
Upvotes: 1
Reputation: 26481
The following should do the trick, this solution does not require the file to be sorted by group name.
awk '(NR==1){next}
{groupc[$4]++}
(groupc[$4]==1){groupv[$4]=$3}
{groupl[$4]=$3}
END{for(i in groupc) { print i":",groupl[i]-groupv[i]} }
' foo
The following things happen :
(NR==1){next}
{groupc[$4]++}
1
define its first value under groupv
groupl
output :
tango: 300
alpha: 0
beta: 100
charlie: 0
Upvotes: 2
Reputation: 92854
Awk
solution:
awk 'NR==1{ next }
g && $4 != g{ print g":"(v - gr[g]) }
!($4 in gr){ gr[$4]=$3 }{ g=$4; v=$3 }
END{ print g":"(v - gr[g]) }' file
NR==1{ next }
- skip the 1st recordg
- variable aimed to hold group namev
- variable aimed to hold group value!($4 in gr){ gr[$4]=$3 }
- on the 1st occurrence of a distinct group name $4
- save its first value $3
into array gr
g && $4 != g{ print g":"(v - gr[g]) }
- if the current group name $4
differs from the previous one g
- print the delta between the last and 1st values of the previous groupThe output:
alpha:0
beta:100
charlie:0
tango:300
Upvotes: 3