HRusby
HRusby

Reputation: 25

Bash: Reading a column from ls -l

For a problem at uni I need to get the file size and file name of the 5 largest files in a series of directories. To do this I'm using two functions, one which loads everything in with ls -l (I realize that parsing info from ls isn't a good method but this particular problem specifies that I can't use find, locate or du). Each line from the ls output is then sent to another function which using awk should withdraw the filesize and file name and store it into an array. Instead I seem to be getting awk trying to open every column from ls to be read. The code for this is as so:

function addFileSize {
    local y=0
    local curLine=$1
    if [[ -z "${sizeArray[0]}" ]]; then
        i=$(awk '{print $5}' $curLine)
        nameArray[y]=$(awk '{print $9}' $curLine)
    elif [[ -z "${sizeArray[1]}" ]]; then
        i=$(awk '{print $5}' $curLine)
        nameArray[y]=$(awk '{print $9}' $curLine)
    elif [[ -z "${sizeArray[2]}" ]]; then
        i=$(awk '{print $5}' $curLine)
        nameArray[y]=$(awk '{print $9}' $curLine)
    elif [[ -z "${sizeArray[3]}" ]]; then
        i=$(awk '{print $5}' $curLine)
        nameArray[y]=$(awk '{print $9}' $curLine)
    elif [[ -z "${sizeArray[4]}" ]]; then
        i=$(awk '{print $5}' $curLine)
        nameArray[y]=$(awk '{print $9}' $curLine)
    fi  

    for i in "${sizeArray[@]}"; do
        echo "$(awk '{print $5}' $curLine)"
        if [[ -z "$i" ]]; then
            i=$(awk '{print $5}' $curLine)
            nameArray[y]=$(awk '{print $9}' $curLine)
            break
        elif [[ $i -lt $(awk '{print $5}' $curLine) ]]; then
            i=$(awk '{print $5}' $curLine)
            nameArray[y]=$(awk '{print $9}' $curLine)
            break
        fi
        let "y++"
    done
    echo "Name Array:"
    echo "${nameArray[@]}"
    echo "Size Array:"
    echo "${sizeArray[@]}"
}

function searchFiles {
    local curdir=$1
    for i in $( ls -C -l -A $curdir | grep -v ^d | grep -v ^total ); do # Searches through all files in the current directory
        if  [[ -z "${sizeArray[4]}" ]]; then
            addFileSize $i
        elif [[ ${sizeArray[4]} -lt $(awk '{print $5}' $i) ]]; then
            addFileSize $i
        fi
    done
}

Any help would be greatly appreciated, thanks.

Upvotes: 1

Views: 953

Answers (5)

Dmitry Grigoryev
Dmitry Grigoryev

Reputation: 3203

If you can't use find locate and du, there's still a straightforward option to get the file size without resorting to ls parsing:

size=$(wc -c < "$file")

wc is smart enough to detect a file on STDIN and call stat to get the size, so this works just as fast.

Upvotes: 0

RJHunter
RJHunter

Reputation: 2867

If the problem is specifically supposed to be about parsing, then awk might be a good option (although ls output is challenging to parse reliably). Likewise, if the problem is about working with arrays, then your solution should focus on those.

However, if the problem is there to encourage learning about the tools available to you, I would suggest:

  • the stat tool prints particular pieces of information about a file (including size)
  • the sort tool re-orders lines of input
  • the head and tail tools print the first and last lines of input
  • and your shell can also perform pathname expansion to list files matching a glob wildcard pattern like *.txt

Imagine a directory with some files of various sizes:

  10000000 sound/concert.wav
   1000000 sound/song.wav
    100000 sound/ding.wav

You can use pathname expansion to find their names:

$ echo sound/*
sound/concert.wav sound/ding.wav sound/song.wav

You can use stat to turn a name into a size:

$ stat -f 'This one is %z bytes long.' sound/ding.wav
This one is 100000 bytes long.

Like most Unix tools, stat works the same whether you provide it one argument or several:

$ stat -f 'This one is %z bytes long.' sound/concert.wav sound/ding.wav sound/song.wav
This one is 10000000 bytes long.
This one is 100000 bytes long.
This one is 1000000 bytes long.

(Check man stat for reference about %z and what else you can print. The file's Name is particularly useful.)


Now you have a list of file sizes (and hopefully you've kept their names around too). How do you find which sizes are biggest?

It's much easier to find the biggest item in a sorted list than an unsorted list. To get a feel for it, think about how you might find the highest two items in this unsorted list:

1234 5325 3243 4389 5894 245 2004 45901 3940 3255

Whereas if the list is sorted, you can find the biggest items very quickly indeed:

245 1234 2004 3243 3255 3940 4389 5325 5894 45901

The Unix sort utility takes lines of input and outputs them from lowest to highest (or in reverse order with sort -r).

It defaults to sorting character-by-character, which is great for words ("apple" comes before "balloon") but not so great for numbers ("10" comes before "9"). You can activate numeric sorting with sort -n.


Once you have a sorted list of lines, you can print the first lines with the head tool, or print the last lines using the tail tool.

The first two items of the (already-sorted) list of words for spell-checking:

$ head -n 2 /usr/share/dict/words
A
a

The last two items:

$ tail -n 2 /usr/share/dict/words
Zyzomys
Zyzzogeton

With those pieces, you can assemble a solution to the problem "find the five biggest files across dir1, dir2, dir3":

stat -f '%z %N' dir1/* dir2/* dir3/* |  
     sort -n |  
     tail -n 5  

Or a solution to "find the biggest file in each of dir1, dir, dir3, dir4, dir5":

for dir in dir1 dir2 dir3 dir4 dir5; do  
    stat -f '%z %N' "$dir"/* |  
        sort -n |  
        tail -n 1  
done

Upvotes: 3

Firefly
Firefly

Reputation: 459

This would be another choice. Ctrl+V+I is how to insert a tab from the command line.

ls -lS dir1 dir2 dir3.. | awk 'BEGIN{print "Size""Ctrl+V+I""Name"}NR <= 6{print $5"Ctrl+V+I"$9}'

Upvotes: 0

Jan Nielsen
Jan Nielsen

Reputation: 11829

Without using find, locate, or du, you could do the following for each directory:

    ls -Sl|grep ^\-|head -5|awk '{printf("%s %d\n", $9, $5);}'

which lists all files by size, filters out directories, takes the top 5, and prints the file name and size. Wrap with a loop in bash for each directory.

Upvotes: 1

Robert
Robert

Reputation: 8663

Use ls -S to sort by size, pipe through head to get the top five, pipe through sed to compress multiple spaces into one, then pipe through cut to get the size and file name fields.

robert@habanero:~/scripts$ ls -lS | head -n 5 | sed -e 's/ / /g' | cut -d " " -f 5,9

32K xtractCode.pl

29K tmd55.pl

24K tagebuch.pl

14K backup

Just specify the directories as arguments to the initial ls.

Upvotes: 0

Related Questions