DSTO
DSTO

Reputation: 285

Grouping duplicated fields with awk

I have the following file:

ID|2018-04-29
ID|2018-04-29
ID|2018-04-29
ID1|2018-06-26
ID1|2018-06-26
ID1|2018-08-07
ID1|2018-08-22

and using awk, I want to add $3 that groups the duplicated IDs based on $1 and $2 so that the output would be

ID|2018-04-29|group1
ID|2018-04-29|group1
ID|2018-04-29|group1
ID1|2018-06-26|group2
ID1|2018-06-26|group2
ID1|2018-08-07|group3
ID1|2018-08-22|group4

I tried the following code but it does not give me the desired output. Also, I am not sure if I can apply it to a column with date in it.

awk -F"|" '{print $0,"group"++seen[$1,$3]}' OFS="|"

Any hints on how to achieve it using awk (one-liner, if possible) would be highly appreciated.

Upvotes: 4

Views: 291

Answers (3)

nsonnson
nsonnson

Reputation: 1

BEGIN   {OFS = FS = "|"}

{ if ($0 != prev) {          #new item
    prev = $0                            
    print $1, $2, "group" ++g
  } 
  else {
    print $1, $2, "group" g
  }
}

Note that the list has to be sorted (from your example, I assume it is). This is my first time posting answer here. Hope the code is readable for you and hope it helps.

Upvotes: 0

Akshay Hegde
Akshay Hegde

Reputation: 16997

and using awk, I want to add $3 that groups the duplicated IDs based on $1 and $2 so that the output would be

Using $1 and $2

If input file is sorted then:

$ awk 'BEGIN{FS=OFS="|"}{print $0, "group" (!a[$1,$2]++?++c:c)}' file
ID|2018-04-29|group1
ID|2018-04-29|group1
ID|2018-04-29|group1
ID1|2018-06-26|group2
ID1|2018-06-26|group2
ID1|2018-08-07|group3
ID1|2018-08-22|group4

If file not sorted then :

$ awk 'BEGIN{FS=OFS="|"}{k=$1 SUBSEP $2}!(k in a){a[k]=++c}{print $0, "group" a[k]}' file
ID|2018-04-29|group1
ID|2018-04-29|group1
ID|2018-04-29|group1
ID1|2018-06-26|group2
ID1|2018-06-26|group2
ID1|2018-08-07|group3
ID1|2018-08-22|group4

Better Readable version:

awk 'BEGIN{
        FS=OFS="|"
     }
     {
       k=$1 SUBSEP $2
     } 
     !(k in a){
        a[k]=++c
     }
     {
       print $0, "group" a[k]
     }' file 

Upvotes: 3

RavinderSingh13
RavinderSingh13

Reputation: 133770

With your shown samples, please try following awk code.

awk -v OFS="|" '!arr[$0]++{count++} {print $0,"group"count}' Input_file

Explanation: Adding detailed explanation for above.

awk '                     ##Starting awk program from here.
BEGIN{                    ##Starting BEGIN section of this program from here.
  OFS="|"                 ##Setting OFS to | here.
}
!arr[$0]++{               ##Checking if current line is NOT present in array then do following.
  count++                 ##Increasing count with 1 here.
}
{
  print $0,"group"count   ##Printing current line with group and count value here.
}
' Input_file              ##Mentioning Input_file name here.

Upvotes: 4

Related Questions