pepco2
pepco2

Reputation: 61

awk check file exists

printf "2015-03-02|/home/user/.ssh/config\n2015-03-02|/home/user/Desktop/temp328\n" | awk -F\| 'if ( -f $2 )  { print $2}'

or

printf "2015-03-02|/home/user/.ssh/config\n2015-03-02|/home/user/Desktop/temp328\n" | awk -F\| '{if (system("test -f" $2)) print $2}'

/home/user/.ssh/config\n2015-03-02 - exists

/home/user/Desktop/temp328 - removed

I want print only exist files but this commands not working.

Upvotes: 6

Views: 13214

Answers (7)

Evgeny
Evgeny

Reputation: 31

In GNU AWK there is loadable library on C language "filefuncs". It loads filsystem data about files, directories, sockets etc. I suppose a quick way to get information about a file is not to use external calls, but an internal function.

#!/usr/bin/gawk -f
@load "filefuncs"
 function exist(file){
  return stat(file, null)
 }
BEGIN{
 print exist("/etc/passwd")}

If file exists it returns '0', else: '-1'
'null' - any free name for an array (2-nd argument is required!)
If you don't want to use any functions, voila:

#!/usr/bin/gawk -f
@load "filefuncs"
BEGIN{print stat("/etc/passwd", null)}

Upvotes: 0

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2805

i'm re-pasting my answer here from another thread, since it seems relative in terms of checking file. I'm mostly adding the generic case about how system( ) can be leveraged to do strange things

In fact, under certain circumstances, you indeed can leverage system() to directly get the output you desire, without having to deal with formatting a command, running it through getline, storing it temporarily, resetting RS (if you've set it to "^$" before), and to also close that command before returning the output, as such :

-rw-r--r--  1 501  20  77079 Jul 26 13:07 ./selectWoo.full.min.js.txt

valid file :: exist_and_non_empty

non-existent file :: cannot locate

32297  gprintf '\033c\033[3J'; echo; ls -lFGnd "./selectWoo.full.min.js"*; 
       mawk2 'function filetest(fn) { 
          gsub(/\047/,"&\134\047&",fn); # in case single-qt in filename
          return 
              system(" exit \140 [ -r \047"(fn)"\047 ] \140 ") 
              ? "cannot locate" 
              : "exist_and_non_empty" 
       } BEGIN { 
           ORS = "\n\n"; 
           fn_pfx="./selectWoo.full.min.js";
           print "\nvalid file :: "      filetest(fn_pfx ".txt"); 
           print "non-existent file :: " filetest(fn_pfx ".txt_fake") 
      }' ; 
      history 1 ; echo

I'm only making it more verbose here for illustrative purposes. Instead of returning whether the system() call was successful or not, we directly set the exit code to be that of the file test.

If you want to simplify the return to be boolean, then make it

return ! system(…)

  • I haven't tested every POSIX file/directory info check flag out there, but I can't imagine there are more than a handful that might fail this code.

You can also perform other tasks, too, as long as the outputs are non-negative integers (assume they will exit_code % 256 before returning, as long as you're comfortable interpreting that output. quick example (\047 is single quote ' , \045 is percent %, 140 is grave-accent [ ` ] )

mawk2 'BEGIN { a = "0123456789ABCDEF"; print 
    system(" exit \140 printf \047\045s\047 \047"(a)"\047
             | wc -c \140 "); }'

which properly prints out "16" for measuring length of string.

I'm fully aware this is a horrible way of using system( ) and POSIX exit codes.

Upvotes: 0

Ilya K.
Ilya K.

Reputation: 153

You can easily do this with BASH and feed/pipe the results to AWK.

% ls
file_list file1 file3
% cat file_list
file1
file2
file3
file4
% cat file_list | bash -c 'while read file ; do [ -f "$file" ] || echo "No file: $file"; done'
No file: file2
No file: file4

Upvotes: 0

will
will

Reputation: 5061

Not really my answer however it hasn't been documented here yet. From "The GNU Awk User's Guide":

Gives this method:

  # readable.awk --- library file to skip over unreadable files

  BEGIN {
      for (i = 1; i < ARGC; i++) {
          if (ARGV[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/ \
              || ARGV[i] == "-" || ARGV[i] == "/dev/stdin")
              continue    # assignment or standard input
          else if ((getline junk < ARGV[i]) < 0) # unreadable
              delete ARGV[i]
          else
              close(ARGV[i])
      }
  }

The actual snippet is processing the command line. The useful bit for the question is the else if ...

   else if ((getline junk < ARGV[i]) < 0) # unreadable
        delete ARGV[i]
      :

That is basicaly a readline on the file named in ARGV[i], when it fails then they delete the array element. File does not exist or unreadable.

Either way you can't use it. All in the same aWk process, no exec to the shell, etc.

I need this today and I wrote the following small function:

  ##  file_exist
  #     * ref: [12.3.3 Checking for Readable Data Files](http://langevin.univ-tln.fr/cours/COMPIL/tps/awk.html#File-Checking)
  #         o [The GNU Awk User's Guide](http://langevin.univ-tln.fr/cours/COMPIL/tps/awk.html)
  #

  function file_exist(  file_path, _rslt, _junk  )
  {
      _rslt = (0==1);     #   false

      if( (getline _junk < file_path) > 0)  )    ## readable 
      {
          _rslt = (1==1);
          close( file_path );
      }
      return _rslt;
  }

Note:

  • Function returns TRUE when the file is empty

Upvotes: 0

James Brown
James Brown

Reputation: 37404

With GNU awk you can use stat() included with the filefuncs extension:

$ ls -l 
-rw-r--r-- 1 james james 4 Oct  3 12:48 foo
-rw------- 1 root  root  0 Oct  3 12:48 bar

Awk:

$ awk -v file=foo '
@load "filefuncs"
BEGIN {
    ret=stat(file,fdata)
    printf "ret:  %d\nsize: %d\n",ret,fdata["size"]
}'

Output for -v file= foo:

ret:  0
size: 4

for bar:

ret:  0
size: 0

and for nonexistent baz:

ret:  -1
size: 0

Upvotes: 5

ghoti
ghoti

Reputation: 46836

It's easy to check for the existence of a readable file in awk, without having to resort to spawning something with system(). Just try to read from the file.

From awk's man page (on my system anyway):

In all cases, getline returns 1 for a successful input, 0 for end of file, and -1 for an error.

So. Some example code.

#!/usr/bin/awk -f

function file_exists(file) {
  n=(getline _ < file);
  if (n > 0) {
    print "Found: " file;
    return 1;
  } else if (n == 0) {
    print "Empty: " file;
    return 1;
  } else {
    print "Error: " file;
    return 0;
  }
}

BEGIN {

  file_exists(ARGV[1]);

}

Gives me these results:

$ touch /tmp/empty
$ touch /tmp/noperm ; chmod 000 /tmp/noperm
$ ./check.awk /etc/passwd
Found: /etc/passwd
$ ./check.awk /nonexistent
Error: /nonexistent
$ ./check.awk /tmp/empty
Empty: /tmp/empty
$ ./check.awk /tmp/noperm
Error: /tmp/noperm

Using your sample data:

$ fmt="2015-03-02|/home/user/.ssh/config\n2015-03-02|/home/user/Desktop/temp328\n"
$ printf "$fmt" | cut -d\| -f2 | xargs -n 1 ./check.awk
Error: /home/user/.ssh/config
Error: /home/user/Desktop/temp328

For more general use, you could shorten this function to something like:

function file_exists(file) {
  if ((getline _ < file) >= 0) { return 1; }
}

Upvotes: 1

tripleee
tripleee

Reputation: 189357

The second attempt was fairly close; you need a space after the test -f.

base$ echo '2015|/etc/mtab
> 2015|/etc/ntab' | awk -F\| '{ if (system("test -f " $2)) print $2}'
/etc/ntab

You probably want to invert to use if (system(...)==0) to get the semantics you expected. Also, somewhat more elegantly, Awk wants a condition outside the braces, so you can avoid the explicit if.

awk -F\| 'system("test -f " $2)==0 { print $2 }'

Agree with commenters that using Awk for this is borderline nuts.

If, as indicated in comments, you need to work with completely arbitrary file names, you can add code to quote any shell specials:

awk -F\| 'system ("test -f " gensub(/[^\/A-Za-z0-9]/, "\\\\&", "g", $2))==0 {
   print $2 }'   # caveat: gensub() is gawk only

... but your overall solution does not cope with file names containing a newline character or a pipe character (since you are using those as record and field separators, respectively) so again, abandoning Awk and starting over with a different approach may be the sane way forward.

(The character class in the substitution is incomplete; there are various punctuation characters etc which could be added, and I may be missing something significant; but on quick examination, the superfluous backslashes should be harmless. If you don't have Gawk, see here and/or, again, consider abandoning this approach.)

while IFS='|' read -r stuff filename; do
    test -f "$filename" && echo "$filename"
done <<':'
2015|/etc/mtab
2016|/etc/ntab
2017|/path/to/file with whitespace in name
2018|/path/to/file\with[funny"characters*in(file'name|even pipes, you see?
:

(Still no way to have a newline, but everything else should be fine.)

Upvotes: 7

Related Questions