Reputation: 25
For a problem at uni I need to get the file size and file name of the 5 largest files in a series of directories. To do this I'm using two functions, one which loads everything in with ls -l (I realize that parsing info from ls isn't a good method but this particular problem specifies that I can't use find, locate or du). Each line from the ls output is then sent to another function which using awk should withdraw the filesize and file name and store it into an array. Instead I seem to be getting awk trying to open every column from ls to be read. The code for this is as so:
function addFileSize {
local y=0
local curLine=$1
if [[ -z "${sizeArray[0]}" ]]; then
i=$(awk '{print $5}' $curLine)
nameArray[y]=$(awk '{print $9}' $curLine)
elif [[ -z "${sizeArray[1]}" ]]; then
i=$(awk '{print $5}' $curLine)
nameArray[y]=$(awk '{print $9}' $curLine)
elif [[ -z "${sizeArray[2]}" ]]; then
i=$(awk '{print $5}' $curLine)
nameArray[y]=$(awk '{print $9}' $curLine)
elif [[ -z "${sizeArray[3]}" ]]; then
i=$(awk '{print $5}' $curLine)
nameArray[y]=$(awk '{print $9}' $curLine)
elif [[ -z "${sizeArray[4]}" ]]; then
i=$(awk '{print $5}' $curLine)
nameArray[y]=$(awk '{print $9}' $curLine)
fi
for i in "${sizeArray[@]}"; do
echo "$(awk '{print $5}' $curLine)"
if [[ -z "$i" ]]; then
i=$(awk '{print $5}' $curLine)
nameArray[y]=$(awk '{print $9}' $curLine)
break
elif [[ $i -lt $(awk '{print $5}' $curLine) ]]; then
i=$(awk '{print $5}' $curLine)
nameArray[y]=$(awk '{print $9}' $curLine)
break
fi
let "y++"
done
echo "Name Array:"
echo "${nameArray[@]}"
echo "Size Array:"
echo "${sizeArray[@]}"
}
function searchFiles {
local curdir=$1
for i in $( ls -C -l -A $curdir | grep -v ^d | grep -v ^total ); do # Searches through all files in the current directory
if [[ -z "${sizeArray[4]}" ]]; then
addFileSize $i
elif [[ ${sizeArray[4]} -lt $(awk '{print $5}' $i) ]]; then
addFileSize $i
fi
done
}
Any help would be greatly appreciated, thanks.
Upvotes: 1
Views: 953
Reputation: 3203
If you can't use find
locate
and du
, there's still a straightforward option to get the file size without resorting to ls
parsing:
size=$(wc -c < "$file")
wc
is smart enough to detect a file on STDIN and call stat
to get the size, so this works just as fast.
Upvotes: 0
Reputation: 2867
If the problem is specifically supposed to be about parsing, then awk might be a good option (although ls
output is challenging to parse reliably). Likewise, if the problem is about working with arrays, then your solution should focus on those.
However, if the problem is there to encourage learning about the tools available to you, I would suggest:
Imagine a directory with some files of various sizes:
10000000 sound/concert.wav 1000000 sound/song.wav 100000 sound/ding.wav
You can use pathname expansion to find their names:
$ echo sound/*
sound/concert.wav sound/ding.wav sound/song.wav
You can use stat to turn a name into a size:
$ stat -f 'This one is %z bytes long.' sound/ding.wav
This one is 100000 bytes long.
Like most Unix tools, stat
works the same whether you provide it one argument or several:
$ stat -f 'This one is %z bytes long.' sound/concert.wav sound/ding.wav sound/song.wav
This one is 10000000 bytes long.
This one is 100000 bytes long.
This one is 1000000 bytes long.
(Check man stat
for reference about %z
and what else you can print. The file's Name is particularly useful.)
Now you have a list of file sizes (and hopefully you've kept their names around too). How do you find which sizes are biggest?
It's much easier to find the biggest item in a sorted list than an unsorted list. To get a feel for it, think about how you might find the highest two items in this unsorted list:
1234 5325 3243 4389 5894 245 2004 45901 3940 3255
Whereas if the list is sorted, you can find the biggest items very quickly indeed:
245 1234 2004 3243 3255 3940 4389 5325 5894 45901
The Unix sort utility takes lines of input and outputs them from lowest to highest (or in reverse order with sort -r
).
It defaults to sorting character-by-character, which is great for words ("apple" comes before "balloon") but not so great for numbers ("10" comes before "9"). You can activate numeric sorting with sort -n
.
Once you have a sorted list of lines, you can print the first lines with the head tool, or print the last lines using the tail tool.
The first two items of the (already-sorted) list of words for spell-checking:
$ head -n 2 /usr/share/dict/words
A
a
The last two items:
$ tail -n 2 /usr/share/dict/words
Zyzomys
Zyzzogeton
With those pieces, you can assemble a solution to the problem "find the five biggest files across dir1, dir2, dir3":
stat -f '%z %N' dir1/* dir2/* dir3/* |
sort -n |
tail -n 5
Or a solution to "find the biggest file in each of dir1, dir, dir3, dir4, dir5":
for dir in dir1 dir2 dir3 dir4 dir5; do
stat -f '%z %N' "$dir"/* |
sort -n |
tail -n 1
done
Upvotes: 3
Reputation: 459
This would be another choice. Ctrl+V+I is how to insert a tab from the command line.
ls -lS dir1 dir2 dir3.. | awk 'BEGIN{print "Size""Ctrl+V+I""Name"}NR <= 6{print $5"Ctrl+V+I"$9}'
Upvotes: 0
Reputation: 11829
Without using find
, locate
, or du
, you could do the following for each directory:
ls -Sl|grep ^\-|head -5|awk '{printf("%s %d\n", $9, $5);}'
which lists all files by size, filters out directories, takes the top 5, and prints the file name and size. Wrap with a loop in bash for each directory.
Upvotes: 1
Reputation: 8663
Use ls -S
to sort by size, pipe through head
to get the top five, pipe through sed
to compress multiple spaces into one, then pipe through cut
to get the size and file name fields.
robert@habanero:~/scripts$ ls -lS | head -n 5 | sed -e 's/ / /g' | cut -d " " -f 5,9
32K xtractCode.pl
29K tmd55.pl
24K tagebuch.pl
14K backup
Just specify the directories as arguments to the initial ls
.
Upvotes: 0