Scott
Scott

Reputation: 611

awk, IFS, and file name truncations

Updated question based on new information…

Here is a gist of my code, with the general idea that I store items in DropBox at:

~/Dropbox/Public/drops/xx.xx.xx/whatever

Where the date is always 2 chars, 2 chars, and 2 chars, dot separated. Within that folder can be more folders and more files, which is why when I use find I do not set the depth and allow it to scan recursively. https://gist.github.com/anonymous/ad51dc25290413239f6f

Below is a shortened version of the gist, it won't run as it stands, I don't believe, though the gist will run assuming you have DropBox installed and there are files at the path location that I set up.

General workflow:
SIZE="+250k" # For `find` this is the value in size I am looking for files to be larger than
# Location where I store the output to `find` to process that file further later on.
TEMP="/tmp/drops-output.txt" 

Next I rm the tmp file and touch a new one.

I will then cd into
DEST=/Users/$USER/Dropbox/Public/drops

Perform a quick conditional check to make sure that I am working where I want to be, 
with all my values as variables, I could mess up easily and not be working where I 
thought I would be.
# Conditional check: is the current directory the one I want to be the working directory?
if [ "$(pwd)" = "${DEST}" ]; then
    echo -e "Destination and current working directory are equal, this is good!:\n    $(pwd)\n"
fi

The meat of step one is the `find` command
# Use `find` to locate a subset of files that are larger than a certain size
# save that to a temp file and process it.  I believe this could all be done in 
# one find command with -exec or similar but I can't figure it out
find . -type f -size "${SIZE}" -exec ls -lh {} \; >> "$TEMP"

Inside $TEMP will be a data set that looks like this:
-rw-r--r--@ 1 me  staff    61K Dec 28  2009 /Users/me/Dropbox/Public/drops/12.28.09/wor-10e619e1-120407.png
-rw-r--r--@ 1 me  staff   230K Dec 30  2009 /Users/me/Dropbox/Public/drops/12.30.09/hijack-loop-d6250496-153355.pdf
-rw-r--r--@ 1 me  staff    49K Dec 31  2009 /Users/me/Dropbox/Public/drops/12.31.09/mt-5a819185-180538.png

The trouble is, not all files will contains no spaces, though I have done all I can to make sure variables are quoted 
and wrapped in parens or braces or quotes where applicable.

With the results in /tmp I run:
# Number of results located as a result of the find `command` above
RESULTS=$(wc -l "$TEMP" | awk '{print $1}')
echo -e "Located: [$RESULTS] total files greater than or equal to $SIZE\n"

# With a result set found via `find`, now use awk to print out the sorted list of file 
# sizes and paths.
echo -e "SIZE    DATE      FILE PATH"
#awk '{print "["$5"]          ", $9, $10}' < "$TEMP" | sort -n
awk '{for(i=5;i<=NF;i++) {printf $i " "} ; printf "\n"}' "$TEMP" | sort -n

With the changes to awk from how I had it originally, my result now looks like this:
751K Oct 21 19:00 ./10.21.14/netflix-67-190039.png 
760K Sep 14 19:07 ./01.02.15/logos/RCA_old_logo.jpg 
797K Aug 21 03:25 ./08.21.14/girl-88-032514.zip 
916K Sep 11 21:47 ./09.11.14/small-shot-4d-214727.png

I want it to look like this:
SIZE    FILE PATH
========================================
751K    ./10.21.14/netflix-67-190039.png 
760K    ./01.02.15/logos/RCA_old_logo.jpg 
797K    ./08.21.14/girl-88-032514.zip 
916K    ./09.11.14/small-shot-4d-214727.png

# All Done
if [ "$?" -ne "0" ]; then
    echo "find of drop files larger than $SIZE completed without errors.\n"
    exit 1
fi

Original Post to Stack prior to gaining some new information leading to new questions…

Original Post is below, given new information, I tried some new tactics and have left myself with the above script and info.

I have a simple script, Mac OS X, it performs a find on a dir and locates all files of type file and of size greater than +SIZE

These are then appended to a file via >>

From there, I have a file that essentially contains a ls -la listing, so I use awk to get to the file size and the file name with this command:

# With a result set found via `find`, now use awk to print out the sorted list of file 
# sizes and paths.
echo -e "SIZE          FILE PATH"
awk '{print "["$5"]          ", $9, $10}' < "$TEMP" | sort -n

All works as I want it to, but I get some filename truncation right at the above code. The entire file is around 30 lines, I have pinned it to this line. I think if I throw in a different Internal Field Sep that would fix it. I could use \t as there can't be a \t in Mac OS X filenames.

I thought it was just quoting, but I can't seem to see where if that is the case. Here is a sample of the data returned, usually I get about 50 results. The first one I stuffed in this file has filename truncation:

[1.0M]           ./11.26.14/Bruna Legal
[1.4M]           ./12.22.14/card-88-082636.jpg 
[1.6M]           ./12.22.14/thrasher-8c-082637.jpg 
[11M]           ./01.20.15/td-6e-225516.mp3 

Bruna Legal is "Bruna Legal Name.pdf" on the filesystem.

Upvotes: 0

Views: 450

Answers (1)

Birei
Birei

Reputation: 36282

You can avoid parsing the output of ls command and do the whole work with find using the printf action, like:

find /tmp -type f -maxdepth 1 -size +4k 2>/dev/null -printf "%kKB %f\n" |
  sort -nrk1,1

In my example it outputs every file that is bigger than 4 kilobytes. The issue is that the find command cannot print formatted output with the size in MB. In addition the numeric ordering does not work for me with square brackets surrounding the number, so I omit them. In my test it yields:

140KB +~JF7115171557203024470.tmp
140KB +~JF3757415404286641313.tmp
120KB +~JF8126196619419441256.tmp
120KB +~JF7746650828107924225.tmp
120KB +~JF7068968012809375252.tmp
120KB +~JF6524754220513582381.tmp
120KB +~JF5532731202854554147.tmp
120KB +~JF4394954996081723171.tmp
24KB +~JF8516467789156825793.tmp
24KB +~JF3941252532304626610.tmp
24KB +~JF2329724875703278852.tmp
16KB 578829321_2015-01-23_1708257780.pdf
12KB 575998801_2015-01-16_1708257780-1.pdf
8KB adb.log

EDIT because I've noted that %k is not accurate enough, so you can use %s to print in bytes and transform to KB o MB using awk, like:

find /tmp -type f -maxdepth 1 -size +4k 2>/dev/null -printf "%sKB %f\n" | 
  sort -nrk1,1 | 
  awk '{ $1 = sprintf( "%.2f", $1 / 1024) } { print }'

It yields:

136.99KB +~JF7115171557203024470.tmp
136.99KB +~JF3757415404286641313.tmp
117.72KB +~JF8126196619419441256.tmp
117.72KB +~JF7068968012809375252.tmp
117.72KB +~JF6524754220513582381.tmp
117.68KB +~JF7746650828107924225.tmp
117.68KB +~JF5532731202854554147.tmp
117.68KB +~JF4394954996081723171.tmp
21.89KB +~JF8516467789156825793.tmp
21.89KB +~JF3941252532304626610.tmp
21.89KB +~JF2329724875703278852.tmp
14.14KB 578829321_2015-01-23_1708257780.pdf
10.13KB 575998801_2015-01-16_1708257780-1.pdf
4.01KB adb.log

Upvotes: 2

Related Questions