Reputation: 35

print duplicate entries without deleting unix/linux

Let's say I have a file like this with 2 columns

    56-cde
    67-cde
    56-cao
    67-cgh
    78-xyz
    456-hhh
    456-jjjj
    45678-nnmn
    45677-abdc
    45678-aief

I am trying to get an output like this:

    56-cde
    56-cao
    67-cde
    67-cgh
    456-hhh
    456-jjjj
    45678-aief
    45678-nnmn

So basically instead of printing out the unique values I need to print the duplicates:

I tried to accomplish this using awk like this :

    cat input.txt | awk -F"-" '{print $1,$2}' | sort -n | uniq -w 2 -D

This is without doubt showing me what values in column 1 have been duplicated, and also displaying the duplicated values of column 1 along with the respective column 2 values. But since I am hardcoding the number of bytes to 2, it displays the duplicated values only for the 2 digit numbers in column one. Is there a way to do this using awk ?

Thanks in advance.

Upvotes: 1

Answers (5)

stack0114106

Reputation: 8791

Using Perl

$ cat two_cols.txt
56-cde
67-cde
56-cao
67-cgh
78-xyz
456-hhh
456-jjjj
45678-nnmn
45677-abdc
45678-aief

$ perl -F"-" -lane ' @t=@{$kv{$F[0]}}; push(@t,$_); $kv{$F[0]}=[@t]; END { while(($x,$y)=each(%kv)){ print join("\n",@{$y}) if scalar @{$y}>1 }} ' two_cols.txt
67-cde
67-cgh
56-cde
56-cao
456-hhh
456-jjjj
45678-nnmn
45678-aief

$

Upvotes: 0

karakfa

Reputation: 67567

another awk solution without arrays (but with presort)

 sort -n file | awk -F- '
       NR==1{p=$1; a=$0; c++; next} 
       p==$1{a=a RS $0; c++; next} 
           c{print a} 
            {a=$0; p=$1; c=0} 
         END{if(c) print a}'

Upvotes: 1

Jeff Y

Reputation: 2466

I would handle the varying-number-of-digits case by pre-conditioning the data so that the number field is a fixed large width (and use that width in uniq):

cat input.txt | awk -F- '{printf "%12d-%s\n",$1,$2}'| sort | uniq -w 12 -D

If you need the output left-justified as well, just tack on this post-conditioning step:

| awk '{print $1}'

Upvotes: 0

Jeff Y

Reputation: 2466

See if your uniq has a -D option. My cygwin version does:

cat input.txt | sort | uniq -w 2 -D

Upvotes: 1

undur_gongor

Reputation: 15954

This is what I came up with (just an awk program, no external sort, uniq etc.):

BEGIN { FS = "-" }

{ arr[$1] = arr[$1] "-" $2  }

END { 
    for (i in arr) {
        if ((n = split(arr[i], a)) < 3) continue
        for (j = 2; j <= n; ++j)
            print i"-"a[j]
    }
}

It collects all numbers along with the different strings attached in arr (assuming the strings won't contain dashes -).

With gawk, you could use arrays of arrays in order to avoid the concatenation and splitting with dashes.

Upvotes: 0

print duplicate entries without deleting unix/linux

Answers (5)

Related Questions