Reputation: 67
I need to print lines with duplicated fields, tried using sed
it's not working.
Input file has two lines:
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u1
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
Output should be only second line, because it has exact duplicated strings (fields).
But it's printing both lines using below command
sed -rn '/(\b\w+\b).*\b\1\b/ p' input_file
Thanks
RKP
Upvotes: 4
Views: 196
Reputation: 8711
Using Perl - regex and backreference
perl -nle ' print if /(?:^|\s)(\S+)\s+.*?(?<=\s)\1(?:\s+|$)/ms ' file
Thanks @Sundeep for finding out the subtle catch and @zdim for helping to fixing it
with below inputs
$ cat input
a b c
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u1
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
1 2 3
a b c
a b b
a a
1
2.5 42 32.5 abc
part cop par
spar cop par
$ perl -nle ' print if /(?:^|\s)(\S+)\s+.*?(?<=\s)\1(?:\s+|$)/ms ' input
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
a b b
a a
$
Another method using hash/lookbehind
$ perl -lane ' %k=/(\S+)(?<=(.))/g ; print if scalar(@F) != scalar(keys %k) ' input
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
a b b
a a
$
Upvotes: 2
Reputation: 203665
Best I can tell from your question all you need is:
$ awk '$1==$3' file
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
If that's not all you need then update your question to provide more truly representative sample input/output.
Upvotes: 1
Reputation: 12438
Input:
$ cat input
a b c
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
1 2 3
a b c
a b b
a a
1
Command:
awk '{for(i=1;i<=NF-1;i++)for(j=i+1;j<=NF;j++)if($i == $j){print; next}}' input
Output:
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
a b b
a a
Explanations:
The solution from RavinderSingh13 is better in term of complexity but uses more memory as it is necessary to save all lines values in a associative array.
{
for (i = 1; i <= NF - 1; i++) { #outer loop to from 1 to NF-1
for (j = i + 1; j <= NF; j++) { #inner loop from i+1
if ($i == $j) { #value comparison of the two elements selected
print $0 #print
next #jump to next line
}
}
}
}
Upvotes: 2
Reputation: 133538
Adding GENERIC solutions with only 1 loop in it. So this will look for if any 2 fields are same in complete line(handy in case you DO NOT want to hard code fields number).
awk '{delete a;for(i=1;i<=NF;i++){if(++a[$i]>1){print;next}}}' Input_file
With your shown samples output will be as follows.
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
Explanation: Adding detailed explanation for above code now.
awk ' ##Starting awk program here.
{ ##Starting main BLOCK here.
delete a
for(i=1;i<=NF;i++){ ##Starting a for loop which runs from i=1 to till value of NF here, where NF is out of the box variable of awk.
if(++a[$i]>1){ ##Checking condition if value of array a whose index is $1 is greater than 1 here, if yes then run following.
print ##Printing current line now, as per OP if 2 fields are equal line should be printed.
next ##Using next keyword for skipping all further statements and skipping basically for loop to save time if a match is found then NO need to run it further.
} ##Closing BLOCK for if condition.
} ##Closing BLOCK for fopr loop here.
} ##Closing main BLOCK here.
' Input_file ##Mentioning Input_file name here.
Upvotes: 2
Reputation: 23667
With grep
if -P
is available or with perl
$ cat ip.txt
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u1
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
2.5 42 32.5 abc
3.14 3.14 123
part cop par
$ grep -P '(?<!\S)(\S++).*(?<!\S)\1(?!\S)' ip.txt
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
3.14 3.14 123
$ perl -ne 'print if /(?<!\S)(\S++).*(?<!\S)\1(?!\S)/' ip.txt
s1/s2/s3/s4/s5/u0 a1_b2_c3_d4_e5_f6_g7 s1/s2/s3/s4/s5/u0
3.14 3.14 123
(?<!\S)
assertion for no non-whitespace character(\S++)
capture all non-whitespace characters, possessive quantifier ensures partial fields won't match.*
any number of in between characters(?<!\S)\1(?!\S)
match entire field, courtesy lookaround assertions for non-whitespace charactersUpvotes: 2
Reputation: 58430
This might work for you (GNU sed):
sed -E 'h;s/\s*(\S+)\s*/\n\1\n/g;/(\n[^\n]+\n).*\1/!d;g' file
Make a copy of the current line in the hold space.
Replace any whitespace by newlines either-side of non-whitespaced strings.
Delete the adulterated line if there are no duplicates.
Otherwise replace the pattern space by the copy of the original line from the hold space and print.
Upvotes: 1
Reputation: 14454
[@BenjaminW. has rightly observed that I have slightly misread the question. My answer is left below for reference but I withdraw it as a candidate answer to the question.]
This does what you want:
sort input_file | uniq -d
The sort
command sorts the input file's contents so that, once sorted, identical lines appear next to one another. The uniq
command ordinarily would collapse repeated lines, but when invoked with the -d
option, instead prints only repeated lines.
Of course, my solution is acceptable only if using sed
is not a requirement.
Upvotes: 1
Reputation: 5252
You can use awk
to do it:
awk '{for(i=1;i<NF;i++)for(j=i+1;j<=NF;j++)if($i==$j){print;next}}' input_file
It's not limited to 3 columns, and no matter where the duplicate happens.
If you want the reverse, print the lines not having duplicates:
awk '{for(i=1;i<NF;i++)for(j=i+1;j<=NF;j++)if($i==$j)next; print}'
Upvotes: 0