ravnur
ravnur

Reputation: 2852

awk: Output to different processes

I have awk script which splits big file into several files by some condition. Than I'm running another script over each file in parallel.

awk -f script.awk -v DEST_FOLDER=tmp input.file
find tmp/ -name "*.part" | xargs -P $ALLOWED_CPUS --replace --verbose /bin/bash -c "./process.sh {}"

The question is: are there any way to run ./process.sh:

The purpose of optimization to get rid of waiting until the awk is done while some files are ready to be processed.

Upvotes: 0

Views: 133

Answers (2)

Ole Tange
Ole Tange

Reputation: 33725

You basically give the answer yourself: GNU Parallel + inotifywait will work.

Since you are not allowed to use inotifywait, you can make your substitute for inotifywait. If you are allowed to write your own script, you are also allowed to run GNU Parallel (as that is just a script).

So something like this:

awk -f script.awk -v DEST_FOLDER=tmp input.file &
sleep 1
record file sizes of files in tmp
while tmp is not empty do
  for files in tmp:
    if file size is unchanged: print file
    record new file size
  sleep 1
done | parallel 'process {}; rm {}'

It is assumed that awk will produce some output with one second. If that takes longer, adjust the sleeps accordingly.

Upvotes: 1

hek2mgl
hek2mgl

Reputation: 158100

Once you have created a file, you can pass the filename to a process' or script's input:

awk '{print name_of_created_file | "./process.sh &"}'

& sends process.sh to the background, so that they can run in parallel. However, this is a gawk extension and not POSIX. Check the manual

Upvotes: 1

Related Questions