vol
vol

Reputation: 58

Bash script to filter out files based on size

I have a lot of log files which are all unique file names, however based on the size, many are exactly the same content (bot generated attacks).

I need to filter out duplicate file sizes or include only unique file sizes. 95% are not unique and I can see the file sizes, so could manually choose sizes to filter out.

I have worked out

find . -size 48c | xargs ls -lSr -h

Will give me only logs of 48 bytes and could continue with this method to create a long string of included files

uniq does not support file size, as far as I can tell

find does have a not option, this may be where I should be looking?

How can I efficiently filter out the known duplicates?

Or is there a different method to filter and display logs based on unique size only.

Upvotes: 2

Views: 1657

Answers (2)

oliver
oliver

Reputation: 2843

You nearly had it, does going with this provide a solution:

find . -size 48c | xargs

Upvotes: 0

Mischa
Mischa

Reputation: 2298

One solution is:

find . -type f -ls | awk '!x[$7]++ {print $11}'

$7 is the filesize column; $11 is the pathname. Since you are using find I assume there are subdirectories, which you don't want to list.

The awk part prints the path of the first file with a given size (only). HTH

Upvotes: 1

Related Questions