Reputation: 276
I was wondering how bad would be the impact in the performance of a program migrated to shell script from C.
I have intensive I/O operations.
For example, in C, I have a loop reading from a filesystem file and writing into another one. I'm taking parts of each line without any consistent relation. I'm doing this using pointers. A really simple program.
In the Shell script, to move through a line, I'm using ${var:(char):(num_bytes)}
. After I finish processing each line I just concatenate it to another file.
"$out" >> "$filename"
The program does something like:
while read line; do
out="$out${line:10:16}.${line:45:2}"
out="$out${line:106:61}"
out="$out${line:189:3}"
out="$out${line:215:15}"
...
echo "$out" >> "outFileName"
done < "$fileName"
The problem is, C takes like half a minute to process a 400MB file and the shell script takes 15 minutes.
I don't know if I'm doing something wrong or not using the right operator in the shell script.
Edit: I cannot use awk since there is not a pattern to process the line
I tried commenting the "echo $out" >> "$outFileName" but it doesn't gets much better. I think the problem is the ${line:106:61} operation. Any suggestions?
Thanks for your help.
Upvotes: 11
Views: 6900
Reputation: 276
As donitor and Dietrich sugested, I did a little research about the AWK language and, again, as they said, it was a total success. here is a little example of the AWK program:
#!/bin/awk -f
{
option=substr($0, 5, 9);
if (option=="SOMETHING"){
type=substr($0, 80, 1)
if (type=="A"){
type="01";
}else if (type=="B"){
type="02";
}else if (type=="C"){
type="03";
}
print substr($0, 7, 3) substr($0, 49, 8) substr($0, 86, 8) type\
substr($0, 568, 30) >> ARGV[2]
}
}
And it works like a charm. It takes barely 1 minute to process a 500mb file
Upvotes: 3
Reputation: 72639
What's wrong with the C program? Is it broken? Too hard to maintain? Too inflexible? You are more of a Shell than a C expert?
If it ain't broke, don't fix it.
A look at Perl might be an option, too. Easier than C to modify and still speedy I/O; and it's much harder to create useless forks in Perl than in the shell.
If you told us exactly what the C program does, maybe there's a simple and faster-than-light solution with sed, grep, awk or other gizmos in the Unix tool box. In other words, tell us what you actually want to achieve, don't ask us to solve some random problem you ran into while pursuing what you think is a step towards your actual goal.
Alright, one problem with your shell script is the repeated open
in echo "$out" >> "outFileName"
. Use this instead:
while read line; do
echo "${line:10:16}.${line:45:2}${line:106:61}${line:189:3}${line:215:15}..."
done < "$fileName" > "$outFileName"
As an alternative, simply use the cut
utility (but note that it doesn't insert the dot after the first part):
cut -c 10-26,45-46,106-166 "$fileName" > "$outFileName"
You get the idea?
Upvotes: 2
Reputation: 272257
I suspect, based on your description, that you're spawning off new processes in your shell script. If that's the case, then that's where your time is going. It takes a lot of OS resource to fork/exec a new process.
Upvotes: 4