Sort and remove duplicates

Question

Please can you help me to solve this problem.

I would like to sort column 1, 2 to be able to delete the duplicates in column 1, keeping always the first 2 records.

The objective in the sort is to keep in the second column diff numbers not the same.

example

I got this

3039949085;**19**;1195616938480000;1  ;V2
3039949085;**19**;1195616938480000;2  ;V2
3039949085;**30**;1195616938480000;2  ;V2

after the sorting should be

3039949085;**19**;1195616938480000;1  ;V2
3039949085;**30**;1195616938480000;2  ;V2
3039949085;**19**;1195616938480000;2  ;V2

i use this code

sort -t';' -k1,2n -k4 file | gawk -F';' 'a[$1]++<2

My input file is:

2995347947;6;1195617034732000;1  ;V3
2995347947;9;1195617034732000;1  ;V3
2995347947;6;1195617034732000;2  ;V3
2995347947;9;1195617034732000;2  ;V3
3039948773;14;1195616284532000;1  ;V2
3039948785;14;1195616747632000;1  ;V2
3039948785;25;1195616747632000;1  ;V2
3039948785;14;1195616747632000;2  ;V2
3039948785;25;1195616747632000;2  ;V2
3039949061;19;1195615542032000;1  ;V2
3039949061;19;1195615542032000;2  ;V2
3039949061;30;1195615542032000;2  ;V2
3039949073;19;1195616109632000;1  ;V2
3039949073;19;1195616109632000;2  ;V2
3039949073;30;1195616109632000;2  ;V2
3039949085;19;1195616938480000;1  ;V2
3039949085;19;1195616938480000;2  ;V2
3039949085;30;1195616938480000;2  ;V2
3039949373;10;1195615559208000;1  ;V2
3039949373;11;1195615559208000;1  ;V2
3039949373;10;1195615559208000;2  ;V2

output I got

2995347947;6;1195617034732000;1  ;V3
2995347947;9;1195617034732000;1  ;V3
3039948773;14;1195616284532000;1  ;V2
3039948785;14;1195616747632000;1  ;V2
3039948785;25;1195616747632000;1  ;V2
3039949061;19;1195615542032000;1  ;V2
3039949061;19;1195615542032000;2  ;V2
3039949073;19;1195616109632000;1  ;V2
3039949073;19;1195616109632000;2  ;V2
3039949085;19;1195616938480000;1  ;V2
3039949085;19;1195616938480000;2  ;V2
3039949373;10;1195615559208000;1  ;V2
3039949373;11;1195615559208000;1  ;V2

But i will like to get the following output

2995347947;6;1195617034732000;1  ;V3
2995347947;9;1195617034732000;1  ;V3
3039948773;14;1195616284532000;1  ;V2
3039948785;14;1195616747632000;1  ;V2
3039948785;25;1195616747632000;1  ;V2
3039949061;19;1195615542032000;1  ;V2
3039949061;30;1195615542032000;2  ;V2
3039949073;19;1195616109632000;1  ;V2
3039949073;30;1195616109632000;2  ;V2
3039949085;30;1195616938480000;2  ;V2
3039949085;19;1195616938480000;1  ;V2
3039949373;10;1195615559208000;1  ;V2
3039949373;11;1195615559208000;1  ;V2

My problem is in the sort step

Appreciate your help.

anubhava · Accepted Answer

You can use this awk to print unique sets of $1,$2:

awk -F';' '!a[$1,$2]++'

Full example:

sort -t';' -k1,2n -k4 file | awk -F';' '!a[$1,$2]++'

2995347947;6;1195617034732000;1  ;V3
2995347947;9;1195617034732000;1  ;V3
3039948773;14;1195616284532000;1  ;V2
3039948785;14;1195616747632000;1  ;V2
3039948785;25;1195616747632000;1  ;V2
3039949061;19;1195615542032000;1  ;V2
3039949061;30;1195615542032000;2  ;V2
3039949073;19;1195616109632000;1  ;V2
3039949073;30;1195616109632000;2  ;V2
3039949085;19;1195616938480000;1  ;V2
3039949085;30;1195616938480000;2  ;V2
3039949373;10;1195615559208000;1  ;V2
3039949373;11;1195615559208000;1  ;V2

Sort and remove duplicates

Answers (1)

Related Questions