Reputation: 5044
I have a file like this:
91052011868;Export Equi_Fort Postal;EXPORT;23/02/2015;1;0;0
91052011868;Sof_equi_Fort_Email_am_%yyyy%%mm%%dd%;EMAIL;19/02/2015;1;0;0
91052011868;Sof_trav_Fort_Email_am_%yyyy%%mm%%dd%;EMAIL;19/02/2015;1;0;0
91052151371;Export Trav_faible temoin;EXPORT;12/02/2015;1;0;0
91052182019;Export Deme_fort temoin;EXPORT;24/02/2015;1;0;0
91052199517;Sof_voya_Faible_Email_pm;EMAIL;22/01/2015;1;0;0
91052199517;Sof_voya_Faible_Email_Relance_pm;EMAIL;26/01/2015;1;0;0
91052262558;Sof_deme_faible_Email_am;EMAIL;26/01/2015;1;0;1
91052265940;Sof_trav_Faible_Email_am_%yyyy%%mm%%dd%;EMAIL;13/02/2015;1;0;0
91052265940;Sof_trav_Faible_Email_Relance_am_%yyyy%%mm%%dd%;EMAIL;17/02/2015;1;0;0
91052265940;Sof_voya_Faible_Email_am_%yyyy%%mm%%dd%;EMAIL;13/02/2015;1;0;0
91052265940;Sof_voya_Faible_Email_Relance_am_%yyyy%%mm%%dd%;EMAIL;16/02/2015;1;0;0
91052531428;Export Trav_faible temoin;EXPORT;11/02/2015;1;0;0
91052547697;Export Deme_Faible Postal;EXPORT;27/02/2015;1;0;0
91052562398;Export Deme_faible temoin;EXPORT;18/02/2015;1;0;0
I want to know all the lines where the first column duplicated values are greater than 1 but strictly inferior to 3.
91052199517;Sof_voya_Faible_Email_pm;EMAIL;22/01/2015;1;0;0
91052199517;Sof_voya_Faible_Email_Relance_pm;EMAIL;26/01/2015;1;0;0
I did the part below but it doesn't work...
sort file | awk 'NR==FNR{a[$1]++;next;}{ if (a[$1] > 0 && a[$1] <1 )print $0;}' file file
Why?
Upvotes: 1
Views: 57
Reputation: 289505
If what you want is to print all those lines whose first field appears twice, you can use this:
$ awk -F";" 'FNR==NR{a[$1]++; next} a[$1]==2' file file
91052199517;Sof_voya_Faible_Email_pm;EMAIL;22/01/2015;1;0;0
91052199517;Sof_voya_Faible_Email_Relance_pm;EMAIL;26/01/2015;1;0;0
This sets the field separator to the semi colon and then reads the file twice:
- the first time to count how many the 1st field appears (a[$1]++
)
- the second time to print those lines matching the condition a[$1]==2
. That is, the first field to appearing twice throughout the file.
If you wanted those indexes appearing between 2 and 4 times, you could use the following syntax on the second block:
a[$1]>=2 && a[$1]<=4
Because your condition says:
if (a[$1] > 0 && a[$1] <1 )
which of course will never happen, since a[$1]
is an integer and no integer is bigger than 0 and smaller than 1.
Note my proposed solution uses the same idea, only that in a bit more idiomatic way: There is no need to be explicit in the if
condition, neither saying print $0
: this is exactly what awk
does when a condition evaluates as True.
Upvotes: 2