Reputation: 49

numbering unique values in a column

I have a file where column one has a list of family identifiers

AB
AB
AB
AB
SAR
SAR
EAR

is there a way that I can create a new column where each repeat is numbered creating a new label for each repeat i.e.

AB_1
AB_2
AB_3
AB_4
SAR_1
SAR_2
EAR_1

I am able to do this in SAS but am looking for a bash option (possibly awk)

data file;
set file; 
count+1;
by name;
if first.name then count=1;
new_name=compress(name||'_'||count);
run;

Upvotes: 2

Answers (2)

Reputation: 195229

try this one-liner:

awk '{a[$0]++;print $0"_"a[$0]}' file

with your input data:

kent$ echo "AB
AB
AB
AB
SAR
SAR
EAR"|awk '{a[$0]++;print $0"_"a[$0]}'
AB_1
AB_2
AB_3
AB_4
SAR_1
SAR_2
EAR_1

Upvotes: 2

Reputation: 85883

$ awk '{print $1"_"++a[$1]}' file
AB_1
AB_2
AB_3
AB_4
SAR_1
SAR_2
EAR_1

Upvotes: 4