Reputation: 159
I have a file similar to the following.
$ ls -1 *.ts | sort -V
media_w1805555829_b1344100_sleng_2197.ts
media_w1805555829_b1344100_sleng_2198.ts
media_w1805555829_b1344100_sleng_2199.ts
media_w1805555829_b1344100_sleng_2200.ts
media_w1501256294_b1344100_sleng_2199.ts
media_w1501256294_b1344100_sleng_2200.ts
media_w1501256294_b1344100_sleng_2201.ts
media_w1501256294_b1344100_sleng_2202.ts
This will print duplicate lines:
$ ls -1 *.ts | sort -V | grep -oP '.*_\K.*(?=.ts)' | sort | uniq -d | sed 's/^/DUPLICATE---:> /'
DUPLICATE---:> 2199
DUPLICATE---:> 2200
I want the output:
DUPLICATE---:> media_w1805555829_b1344100_sleng_2199.ts
DUPLICATE---:> media_w1805555829_b1344100_sleng_2200.ts
DUPLICATE---:> media_w1501256294_b1344100_sleng_2199.ts
DUPLICATE---:> media_w1501256294_b1344100_sleng_2200.ts
Upvotes: 1
Views: 49
Reputation: 12465
Use this Perl one-liner:
ls -1 *.ts | perl -lne '
$cnt{$1}++ if /_(\d+).ts$/;
push @files, [ $_, $1 ];
END {
for ( grep $cnt{$_->[1]} > 1, @files ) {
print "DUPLICATE---:> $_->[0]"
}
}'
This eliminates the need to sort.
The %cnt
hash holds the count of the suffixes (the parts of the filename that you want to find duplicates in).
@files
is an array of arrays. Each of its elements is an anonymous array with 2 elements: the file name and the suffix.
grep $cnt{$_->[1]} > 1, @files
: The grep
selects the elements of the @files
array where the suffix is a dupe.
The Perl one-liner uses these command line flags:
-e
: Tells Perl to look for code in-line, instead of in a file.
-n
: Loop over the input one line at a time, assigning it to $_
by default.
-l
: Strip the input line separator ("\n"
on *NIX by default) before executing the code in-line, and append it when printing.
SEE ALSO:
perldoc perlrun
: how to execute the Perl interpreter: command line switches
Upvotes: 1
Reputation: 12917
ls -1 *.ts | sort -V | awk -F[_.] '
{
map[$5]+=1;
map1[$5][$0]
}
END {
for (i in map)
{
if(map[i]>1)
{
for (j in map1[i])
{
print "DUPLICATE---:> "j
}
}
}
}' | sort
One liner
ls -1 *.ts | sort -V | awk -F[_.] '{ map[$5]+=1;map1[$5][$0] } END { for (i in map) { if(map[i]>1) { for (j in map1[i]) { print "DUPLICATE---:> "j } } } }' | sort
Using awk, set the field seperator to _ or . Then create two arrays. The first (map) holds a count for each number in the file path. The second (map1) is a multi dimensional array with the first index as the number and the second as the complete line (file path). We then loop through the array map at the end and check for any counts that are greater than one. If we find any, we loop through the second map1 array and print the lines (second index) along with the additional text. We finally run through sort again to get the ordering as required,.
Upvotes: 1