Reputation: 67

How to print lines with duplicated fields?

I need to print lines with duplicated fields, tried using sed it's not working.
Input file has two lines:

s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u1
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0

Output should be only second line, because it has exact duplicated strings (fields).
But it's printing both lines using below command

sed -rn '/(\b\w+\b).*\b\1\b/ p' input_file

Thanks
RKP

Upvotes: 4

Answers (8)

stack0114106

Reputation: 8711

Using Perl - regex and backreference

perl -nle ' print if /(?:^|\s)(\S+)\s+.*?(?<=\s)\1(?:\s+|$)/ms ' file

Thanks @Sundeep for finding out the subtle catch and @zdim for helping to fixing it

with below inputs

$ cat  input
a b c
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u1
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
1 2 3
a b c
a b b
a a
1
2.5 42 32.5 abc
part cop par
spar cop par

$ perl -nle ' print if /(?:^|\s)(\S+)\s+.*?(?<=\s)\1(?:\s+|$)/ms ' input
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
a b b
a a

$

Another method using hash/lookbehind

$ perl -lane ' %k=/(\S+)(?<=(.))/g ; print if scalar(@F) != scalar(keys %k) ' input
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
a b b
a a

$

Upvotes: 2

Ed Morton

Reputation: 203665

Best I can tell from your question all you need is:

$ awk '$1==$3' file
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0

If that's not all you need then update your question to provide more truly representative sample input/output.

Upvotes: 1

Allan

Reputation: 12438

Input:

$ cat input
a b c
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
1 2 3
a b c
a b b
a a
1

Command:

awk '{for(i=1;i<=NF-1;i++)for(j=i+1;j<=NF;j++)if($i == $j){print; next}}' input

Output:

s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
a b b
a a

Explanations:

The solution from RavinderSingh13 is better in term of complexity but uses more memory as it is necessary to save all lines values in a associative array.

{
        for (i = 1; i <= NF - 1; i++) { #outer loop to from 1 to NF-1
                for (j = i + 1; j <= NF; j++) { #inner loop from i+1
                        if ($i == $j) { #value comparison of the two elements selected
                                print $0 #print
                                next    #jump to next line
                        }
                }
        }
}

Upvotes: 2

RavinderSingh13

Reputation: 133538

Adding GENERIC solutions with only 1 loop in it. So this will look for if any 2 fields are same in complete line(handy in case you DO NOT want to hard code fields number).

awk '{delete a;for(i=1;i<=NF;i++){if(++a[$i]>1){print;next}}}'  Input_file

With your shown samples output will be as follows.

s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0

Explanation: Adding detailed explanation for above code now.

awk '                           ##Starting awk program here.
{                               ##Starting main BLOCK here.
  delete a
  for(i=1;i<=NF;i++){           ##Starting a for loop which runs from i=1 to till value of NF here, where NF is out of the box variable of awk.
    if(++a[$i]>1){              ##Checking condition if value of array a whose index is $1 is greater than 1 here, if yes then run following.
      print                     ##Printing current line now, as per OP if 2 fields are equal line should be printed.
      next                      ##Using next keyword for skipping all further statements and skipping basically for loop to save time if a match is found then NO need to run it further.
    }                           ##Closing BLOCK for if condition.
  }                             ##Closing BLOCK for fopr loop here.
}                               ##Closing main BLOCK here.
'   Input_file                  ##Mentioning Input_file name here.

Upvotes: 2

Sundeep

Reputation: 23667

With grep if -P is available or with perl

$ cat ip.txt
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u1
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
2.5 42 32.5 abc
3.14 3.14 123
part cop par

$ grep -P '(?<!\S)(\S++).*(?<!\S)\1(?!\S)' ip.txt
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
3.14 3.14 123

$ perl -ne 'print if /(?<!\S)(\S++).*(?<!\S)\1(?!\S)/' ip.txt
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
3.14 3.14 123

(?<!\S) assertion for no non-whitespace character
(\S++) capture all non-whitespace characters, possessive quantifier ensures partial fields won't match
.* any number of in between characters
(?<!\S)\1(?!\S) match entire field, courtesy lookaround assertions for non-whitespace characters

Upvotes: 2

potong

Reputation: 58430

This might work for you (GNU sed):

sed -E 'h;s/\s*(\S+)\s*/\n\1\n/g;/(\n[^\n]+\n).*\1/!d;g' file

Make a copy of the current line in the hold space.

Replace any whitespace by newlines either-side of non-whitespaced strings.

Delete the adulterated line if there are no duplicates.

Otherwise replace the pattern space by the copy of the original line from the hold space and print.

Upvotes: 1

thb

Reputation: 14454

[@BenjaminW. has rightly observed that I have slightly misread the question. My answer is left below for reference but I withdraw it as a candidate answer to the question.]

This does what you want:

sort input_file | uniq -d

The sort command sorts the input file's contents so that, once sorted, identical lines appear next to one another. The uniq command ordinarily would collapse repeated lines, but when invoked with the -d option, instead prints only repeated lines.

Of course, my solution is acceptable only if using sed is not a requirement.

Upvotes: 1

Tyl

Reputation: 5252

You can use awk to do it:

awk '{for(i=1;i<NF;i++)for(j=i+1;j<=NF;j++)if($i==$j){print;next}}' input_file

It's not limited to 3 columns, and no matter where the duplicate happens.

If you want the reverse, print the lines not having duplicates:

awk '{for(i=1;i<NF;i++)for(j=i+1;j<=NF;j++)if($i==$j)next; print}'

Upvotes: 0

How to print lines with duplicated fields?

Answers (8)

Related Questions