Sridhar Sarnobat
Sridhar Sarnobat

Reputation: 25236

Bash Pipelined Shell scripts I wrote are Starved/Deadlocked/Performing poorly

I have used duff to create a report of duplicate files in my file system:

duff ~/Photos > ~/Photos/duplicates.txt

I have written some groovy scripts to transform this report into an HTML page where I can view the duplicate photos in my browser:

cat duplicates.txt | filter_non_existent.groovy | duff_to_json.groovy | json_to_html.groovy > duplicates.html

About 50% of the time, it works beautifully without any problems.

Other times some of my scripts don't start executing, or run very slowly. Why is this, and what can I do to prevent it? (and why don't "professionally written" command line programs like grep suffer from this problem?)

More information

Upvotes: 0

Views: 47

Answers (2)

rici
rici

Reputation: 241721

Why do you check if the stream is ready? All you want to do is wait for data to be available and read it when it is available; if you just use an ordinary potentially-blocking read, that is what will happen.

If you exit the read loop as soon as no data is immediately available, which is what your code excerpt seems to be doing, then it is likely that your program will terminate early, at least some of the time. If you just read (possibly blocking until data is available) until you see an EOF, then things should Just Work. (That is what grep and other "professionally written programs" do.)

Upvotes: 2

niken
niken

Reputation: 2611

It's hard to say why with the limited info you've provided. Every time you pipe to another instance of groovy, you're starting up a new jvm, which has overhead and that will tax your system. (Check default startup settings for java.)

The other thing is your scripts sleep for 2.5 seconds, then poll for work, then go back to sleep for 2.5 seconds... If you have 3 scripts executing sequentially and one is waiting for output from the other, which is sleeping, etc. You can wait a maximum of:

2.5 + 2.5 + 2.5 = 7.5 seconds

Or longer if the work one of the other scripts has to do is longer than a second (like processing lots of Strings directly in java)... It may be worth your while to learn to use: bash, grep, etc. To first avoid re-inventing the wheel and second, use the right tool for the right job.

Groovy is great, but from what you're doing you can probably filter things with grep -v and for duff to json, json to html, maybe write ur adapter to go to html directly? Sounds like whatever you're doing to convert text is probably not very efficient...

Upvotes: 2

Related Questions