Reputation: 1118
I have a csv file like this
KEY,F1,F2,STEP,LAST_OCCURRENCE
100.101,a,b,STEP_1,<empty>
100.102,c,d,STEP_1,<empty>
100.103,e,f,STEP_1,<empty>
100.101,g,h,STEP_1,<empty>
100.103,i,j,STEP_1,<empty>
100.101,g,h,STEP_2,<empty>
100.103,i,j,STEP_2,<empty>
I am able to change the final field to whatever is easiest to parse so it can be considered as either blank i.e ,\n
or containing the word <empty>
as above.
From this file I have to replace "LAST_OCCURRENCE" field matching with last occurrence of [ KEY + STEP ] value with a boolean value (indicating that it's the last value for the tuple).
The expected result is this one:
KEY,F1,F2,STEP,LAST_OCCURRENCE
100.101,a,b,STEP_1,false
100.102,c,d,STEP_1,true #Last 100.102 for STEP_1
100.103,e,f,STEP_1,false
100.101,g,h,STEP_1,true #Last 100.101 for STEP_1
100.103,i,j,STEP_1,true #Last 100.103 for STEP_1
100.101,g,h,STEP_2,true #Last 100.101 for STEP_2
100.103,i,j,STEP_2,true #Last 100.103 for STEP_2
Which is the fastest approach? Would be possible to do it with a sed script or would be better to post-process the input file with another (perl? php?) script?
Upvotes: 2
Views: 151
Reputation: 203522
$ awk 'BEGIN{FS=OFS=","} NR==FNR{last[$1,$4]=NR;next} FNR>1{$NF=(FNR==last[$1,$4] ? "true" : "false")} 1' file file
KEY,F1,F2,STEP,LAST_OCCURRENCE
100.101,a,b,STEP_1,false
100.102,c,d,STEP_1,true
100.103,e,f,STEP_1,false
100.101,g,h,STEP_1,true
100.103,i,j,STEP_1,true
100.101,g,h,STEP_2,true
100.103,i,j,STEP_2,true
Upvotes: 1
Reputation: 785156
Using tac
and awk
:
tac file |
awk 'BEGIN{FS=OFS=","} $1 != "KEY"{$NF = (seen[$1,$4]++) ? "false" : "true"} 1' |
tac
After listing the file in reverse order using tac
, we use an associative array seen
with composite key as $1,$4
to figure out first occurrence of each composite key. Finally we do tac
to get the file back in original order.
Output:
KEY,F1,F2,STEP,LAST_OCCURRENCE
100.101,a,b,STEP_1,false
100.102,c,d,STEP_1,true
100.103,e,f,STEP_1,false
100.101,g,h,STEP_1,true
100.103,i,j,STEP_1,true
100.101,g,h,STEP_2,true
100.103,i,j,STEP_2,true
Upvotes: 2