Reputation: 724

Remove duplicate records in first column but dont modify the rest of columns

I would like to remove the duplicate records in column 1, keeping the first instance. But keeping the rest of columns untouched .

input

444444              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
444444              116,118,124-125,120,122-123,126,132.                       
444444              25-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
444444              110,118,124-125,120,122-123,126,132.                       
111111              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
111111              116,118,124-125,120,122.                                   
111111              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
232323              20-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
232323              116,118,124-125,120,122-123,126,132.

output

444444              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
                    116,118,124-125,120,122-123,126,132.                       
                    25-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
                    110,118,124-125,120,122-123,126,132.                       
111111              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
                    116,118,124-125,120,122.                                   
                    21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
232323              20-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
                    116,118,124-125,120,122-123,126,132.

i tried

 awk '!NF {print;next}; !($1 in a) {a[$1];print}' file

Also, tried split the file in two parts :

file 1: first column and remove the duplicates and keep first > output1
file 2: Second Column 
paste output1 file2 > file-output.

Is there the option to do in simple awk line.

Upvotes: 3

Answers (4)

ctac_

Reputation: 2471

To keep format of lines

You can try

awk '$1!=prev{prev=new=$1;gsub("."," ",new);print;next}{sub($1,new)}1' input

if $1 contains regexp metachars

awk '
  $1!=prev {
    prev=new=$1
    gsub("."," ",new)
    print
    next }
  { i=split($1,a,//)
    b=""
    for(j=1;j<=i;j++)
    b=b "[" a[j] "]"
    sub(b,new) }
1' input

Upvotes: 1

Ed Morton

Reputation: 203169

Anything that modifies $1 does modify the record. The way to really do what you asked for is:

$ awk 'seen[$1]++{rep=$1; gsub(/./," ",rep); sub(/[^[:space:]]+/,rep)} 1' file
444444              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
                    116,118,124-125,120,122-123,126,132.
                    25-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
                    110,118,124-125,120,122-123,126,132.
111111              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
                    116,118,124-125,120,122.
                    21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
232323              20-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
                    116,118,124-125,120,122-123,126,132.

The above only removes the duplicate $1 values and leaves everything else, including white space within and between fields, exactly as-is.

Upvotes: 1

anubhava

Reputation: 784918

This awk may work for you:

awk 'seen[$1]++{$1="\t\t"} 1' file

444444   21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
         116,118,124-125,120,122-123,126,132.
111111   21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
         116,118,124-125,120,122.
232323   21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
         116,118,124-125,120,122-123,126,132.

Upvotes: 2

RavinderSingh13

Reputation: 133428

If your Input_file is sorted by first column like you showed then following may help you on same.

awk 'prev==$1{$1="                   "} 1; {prev=$1}'   Input_file

Solution 2nd: In case your Input_file is not sorted then following may help you on same.

 awk '++a[$1]>1{$1="                   "} 1'   Input_file

Upvotes: 1

Remove duplicate records in first column but dont modify the rest of columns

Answers (4)

Related Questions