Reputation: 58
I have a lot of log files which are all unique file names, however based on the size, many are exactly the same content (bot generated attacks).
I need to filter out duplicate file sizes or include only unique file sizes. 95% are not unique and I can see the file sizes, so could manually choose sizes to filter out.
I have worked out
find . -size 48c | xargs ls -lSr -h
Will give me only logs of 48 bytes and could continue with this method to create a long string of included files
uniq
does not support file size, as far as I can tell
find
does have a not
option, this may be where I should be looking?
How can I efficiently filter out the known duplicates?
Or is there a different method to filter and display logs based on unique size only.
Upvotes: 2
Views: 1657
Reputation: 2843
You nearly had it, does going with this provide a solution:
find . -size 48c | xargs
Upvotes: 0
Reputation: 2298
One solution is:
find . -type f -ls | awk '!x[$7]++ {print $11}'
$7 is the filesize column; $11 is the pathname.
Since you are using find
I assume there are subdirectories, which you don't want to list.
The awk
part prints the path of the first file with a given size (only).
HTH
Upvotes: 1