Reputation: 1239

Hide cat prompt errors

I would like to set a script in order to continuously parse a specific marker in a xml file.

The script contains the following while loop:

function scan_t()
{
INPUT_FILE=${1}
while : ; do
   if [[ -f "$INPUT_FILE" ]]
   then
      ret=`cat ${INPUT_FILE} | grep "<data>" | awk -F"=|>" '{print $2}' | awk -F"=|<" '{print $1}'`
      if [[ "$ret" -ne 0 ]] && [[ -n "$ret" ]]
      then
         ...
      fi
   fi
done
} 
scant_t "/tmp/test.xml"

The line format is :

<data>0</data> or <data>1</data> <data>2</data> ..

Even if the condition if [[ -f "$INPUT_FILE" ]] has been added to the script, sometimes I get:

cat: /tmp/test.xml: No such file or directory.

Indeed, the $INPUT_FILE is normally consumed by an other process which is charged to suppress the file after reading.

This while loop is only used for test, the cat error doesn't matter but I would like to hide this return because it pollutes the terminal a lot.

Upvotes: 0

Answers (1)

Peter Cordes

Reputation: 365257

If some other process can also read and remove the file before this script sees it, you've designed your system with a race condition. (I assume that "charged to suppress" means "designed to unlink"...)

If it's optional for this script to see every input file, then just redirect stderr to /dev/null (i.e. ignore errors when the race condition bites). If it's not optional, then have this script rename the input file to something else, and have the other process watch for that. Check for that file existing before you do the rename, to make sure you don't overwrite a file the other process hasn't read yet.

Your loop has a horrible design. First, you're busy-waiting (with no sleep at all) on the file coming into existence. Second, you're running 4 programs when the input exists, instead of 1.

The busy-wait can be avoided by using inotifywait to watch the directory for changes. So the if [[ -f $INPUT_FILE ]] loop body only runs after a modification to the directory, rather than as fast as a CPU core can run it.

The second is simpler to address: never cat file | something. Either something file, or something < file if something doesn't take filenames on its command line, or behaves differently. cat is only useful if you have multiple files to concatenate. For reading a file into a shell variable, use foo=$(<file).

I see from comments you've already managed to turn your whole pipeline into a single command. So write

INPUT_FILE=foo;
inotifywait -m -e close_write -e moved_to --format %f . |
while IFS= read -r event_file;do
    [[ $event_file == $INPUT_FILE ]] &&
       awk -F '[<,>]' '/data/ {printf "%s ",$3} END {print ""}' "$INPUT_FILE" 2>/dev/null

     #  echo "$event_file" &&
     #  date;
done
# tested and working with the commented-out echo/date commands

Note that I'm waiting for close_write and moved_to, rather than other events, to avoid jumping the gun and reading a file that's not finished being written. Put $INPUT_FILE in its own directory, so you don't get false-positive events waking up your loop for other filenames.

To also implement the rename-to-input-for-next-stage suggestion, you'd put a while [[ -e $INPUT2 ]]; do sleep 0.2; done; mv -n "$INPUT_FILE" "$INPUT2" busy-wait loop after the awk.

An alternative would be to run inotifywait once per loop iteration, but that has the potential for you to get stuck with $INPUT_FILE created before inotifywait started watching. So the producer would be waiting for the consumer to consume, and the consumer wouldn't see the event.

# Race condition with an asynchronous producer, DON'T USE
while inotifywait -qq -e close_write -e moved_to; do
    [[ $event_file == $INPUT_FILE ]] &&
       awk -F '[<,>]' '/data/ {printf "%s ",$3} END {print ""}' "$INPUT_FILE" 2>/dev/null
done

There doesn't seem to be a way to specify the name of a file that doesn't exist yet, even as a filter, so the loop body needs to test for the specific file existing in the dir before using.

If you don't have inotifywait available, you could just put a sleep into the loop. GNU sleep supports fractional seconds, like sleep 0.5. Busybox probably doesn't. You might want to write a tiny trivial C program anyway, which keeps trying to open(2) the file in a loop that includes a usleep or nanosleep. When open succeeds, redirect stdin from that, and exec your awk program. That way, there's no race possible between a stat and an open.

#include <unistd.h>    // for usleep/dup2

#include <sys/types.h>  // for open
#include <sys/stat.h>
#include <fcntl.h>

#include <errno.h>
#include <stdio.h>  // for perror

void waitloop(const char *path)
{
    const char *const awk_args[] = { "-F", "[<,>]",
         "/data/ {printf \"%s \",$3} END {print \"\"}",
         path
    };
    while(42) {
        int fd = open(path, O_RDONLY);
        if (-1 != fd) {
            // if you fork() here, you can avoid the shell loop too.

            dup2(fd, 0);  // redirect stdin from fd.  In theory should check for error here, too.
            close(fd);   // and do this in the parent after fork
            execv("/usr/bin/awk", (char * const*)awk_args);  // execv's prototype doesn't prevent it from modifying the strings?
        } else if(errno != ENOENT) {
            perror("opening the file");
        } // else ignore ENOENT
        usleep(10000);  // 10 milliseconds.
    }

}
// optional TODO: error-check *all* the system calls.

This compiles, but I haven't tested it. Looping inside a single process doing open / usleep is much lighter weight than running a whole process to do sleep 0.01 from a shell.

Even better would be to use inotify to watch for directory events to detect the file appearing, instead of usleep. To avoid a race, after setting up the inotify watch, do another check for the file existing, in case it got created after your last check, but before the inotify watch became active.

Upvotes: 4

Hide cat prompt errors

Answers (1)

Related Questions