Reputation: 724
I would like to remove the duplicate records in column 1, keeping the first instance. But keeping the rest of columns untouched .
input
444444 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
444444 116,118,124-125,120,122-123,126,132.
444444 25-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
444444 110,118,124-125,120,122-123,126,132.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
111111 116,118,124-125,120,122.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
232323 20-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
232323 116,118,124-125,120,122-123,126,132.
output
444444 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122-123,126,132.
25-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
110,118,124-125,120,122-123,126,132.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122.
21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
232323 20-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122-123,126,132.
i tried
awk '!NF {print;next}; !($1 in a) {a[$1];print}' file
Also, tried split the file in two parts :
file 1: first column and remove the duplicates and keep first > output1
file 2: Second Column
paste output1 file2 > file-output.
Is there the option to do in simple awk line.
Upvotes: 3
Views: 153
Reputation: 2471
To keep format of lines
You can try
awk '$1!=prev{prev=new=$1;gsub("."," ",new);print;next}{sub($1,new)}1' input
if $1 contains regexp metachars
awk '
$1!=prev {
prev=new=$1
gsub("."," ",new)
print
next }
{ i=split($1,a,//)
b=""
for(j=1;j<=i;j++)
b=b "[" a[j] "]"
sub(b,new) }
1' input
Upvotes: 1
Reputation: 203169
Anything that modifies $1 does modify the record. The way to really do what you asked for is:
$ awk 'seen[$1]++{rep=$1; gsub(/./," ",rep); sub(/[^[:space:]]+/,rep)} 1' file
444444 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122-123,126,132.
25-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
110,118,124-125,120,122-123,126,132.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122.
21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
232323 20-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122-123,126,132.
The above only removes the duplicate $1 values and leaves everything else, including white space within and between fields, exactly as-is.
Upvotes: 1
Reputation: 784918
This awk
may work for you:
awk 'seen[$1]++{$1="\t\t"} 1' file
444444 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122-123,126,132.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122.
232323 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122-123,126,132.
Upvotes: 2
Reputation: 133428
If your Input_file is sorted by first column like you showed then following may help you on same.
awk 'prev==$1{$1=" "} 1; {prev=$1}' Input_file
Solution 2nd: In case your Input_file is not sorted then following may help you on same.
awk '++a[$1]>1{$1=" "} 1' Input_file
Upvotes: 1