Reputation: 285
I have the following file:
ID|2018-04-29
ID|2018-04-29
ID|2018-04-29
ID1|2018-06-26
ID1|2018-06-26
ID1|2018-08-07
ID1|2018-08-22
and using awk, I want to add $3
that groups the duplicated IDs based on $1
and $2
so that the output would be
ID|2018-04-29|group1
ID|2018-04-29|group1
ID|2018-04-29|group1
ID1|2018-06-26|group2
ID1|2018-06-26|group2
ID1|2018-08-07|group3
ID1|2018-08-22|group4
I tried the following code but it does not give me the desired output. Also, I am not sure if I can apply it to a column with date in it.
awk -F"|" '{print $0,"group"++seen[$1,$3]}' OFS="|"
Any hints on how to achieve it using awk (one-liner, if possible) would be highly appreciated.
Upvotes: 4
Views: 291
Reputation: 1
BEGIN {OFS = FS = "|"}
{ if ($0 != prev) { #new item
prev = $0
print $1, $2, "group" ++g
}
else {
print $1, $2, "group" g
}
}
Note that the list has to be sorted (from your example, I assume it is). This is my first time posting answer here. Hope the code is readable for you and hope it helps.
Upvotes: 0
Reputation: 16997
and using awk, I want to add $3 that groups the duplicated IDs based on $1 and $2 so that the output would be
Using $1
and $2
If input file is sorted then:
$ awk 'BEGIN{FS=OFS="|"}{print $0, "group" (!a[$1,$2]++?++c:c)}' file
ID|2018-04-29|group1
ID|2018-04-29|group1
ID|2018-04-29|group1
ID1|2018-06-26|group2
ID1|2018-06-26|group2
ID1|2018-08-07|group3
ID1|2018-08-22|group4
If file not sorted then :
$ awk 'BEGIN{FS=OFS="|"}{k=$1 SUBSEP $2}!(k in a){a[k]=++c}{print $0, "group" a[k]}' file
ID|2018-04-29|group1
ID|2018-04-29|group1
ID|2018-04-29|group1
ID1|2018-06-26|group2
ID1|2018-06-26|group2
ID1|2018-08-07|group3
ID1|2018-08-22|group4
Better Readable version:
awk 'BEGIN{
FS=OFS="|"
}
{
k=$1 SUBSEP $2
}
!(k in a){
a[k]=++c
}
{
print $0, "group" a[k]
}' file
Upvotes: 3
Reputation: 133770
With your shown samples, please try following awk
code.
awk -v OFS="|" '!arr[$0]++{count++} {print $0,"group"count}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
OFS="|" ##Setting OFS to | here.
}
!arr[$0]++{ ##Checking if current line is NOT present in array then do following.
count++ ##Increasing count with 1 here.
}
{
print $0,"group"count ##Printing current line with group and count value here.
}
' Input_file ##Mentioning Input_file name here.
Upvotes: 4