Reputation: 2852
I have awk script which splits big file into several files by some condition. Than I'm running another script over each file in parallel.
awk -f script.awk -v DEST_FOLDER=tmp input.file
find tmp/ -name "*.part" | xargs -P $ALLOWED_CPUS --replace --verbose /bin/bash -c "./process.sh {}"
The question is: are there any way to run ./process.sh:
The purpose of optimization to get rid of waiting until the awk is done while some files are ready to be processed.
Upvotes: 0
Views: 133
Reputation: 33725
You basically give the answer yourself: GNU Parallel + inotifywait will work.
Since you are not allowed to use inotifywait, you can make your substitute for inotifywait. If you are allowed to write your own script, you are also allowed to run GNU Parallel (as that is just a script).
So something like this:
awk -f script.awk -v DEST_FOLDER=tmp input.file &
sleep 1
record file sizes of files in tmp
while tmp is not empty do
for files in tmp:
if file size is unchanged: print file
record new file size
sleep 1
done | parallel 'process {}; rm {}'
It is assumed that awk
will produce some output with one second. If that takes longer, adjust the sleeps accordingly.
Upvotes: 1
Reputation: 158100
Once you have created a file, you can pass the filename to a process' or script's input:
awk '{print name_of_created_file | "./process.sh &"}'
&
sends process.sh
to the background, so that they can run in parallel. However, this is a gawk
extension and not POSIX. Check the manual
Upvotes: 1