Reputation: 128

Execute Large C Program By Generating Intermediate Stages

I have an algorithm that takes 7 days to Run To Completion (and few more algorithms too)

Problem: In order to successfully Run the program, I need continuous power supply. And if out of luck, there is a power loss in the middle, I need to restart it again.

So I would like to ask a way using which I can make my program execute in phases (say each phase generates Results A,B,C,...) and now in case of a power loss I can some how use this intermediate results and continue/Resume the Run from that point.

Problem 2: How will i prevent a file from re opening every time a loop iterates ( fopen was placed in a loop that runs nearly a million times , this was needed as the file is being changed with each iteration)

Upvotes: 0

Answers (5)

weismat

Reputation: 7411

Sounds like classical batch processing problem for me.
You will need to define checkpoints in your application and store the intermediate data until a checkpoint is reached.
Checkpoints could be the row number in a database, or the position inside a file.
Your processing might take longer than now, but it will be more reliable.
In general you should think about the bottleneck in your algo.
For problem 2, you must use two files, it might be that your application will be days faster, if you call fopen 1 million times less...

Upvotes: 0

wmorrison365

Reputation: 6318

Problem 2: Opening the file on each iteration of the loop because it's changed

I may not be best qualified to answer this but doing fopen on each iteration (and fclose) presumably seems wasteful and slow. To answer, or have anyone more qualified answer, I think we'd need to know more about your data.

For instance:

Is it text or binary?
Are you processing records or a stream of text? That is, is it a file of records or a stream of data? (you aren't cracking genes are you? :-)

I ask as, judging by your comment "because it's changed each iteration", would you be better using a random-accessed file. By this, I'm guessing you're re-opening to fseek to a point that you may have passed (in your stream of data) and making a change. However, if you open a file as binary, you can fseek through anywhere in the file using fsetpos and fseek. That is, you can "seek" backwards.

Additionally, if your data is record-based or somehow organised, you could also create an index for it. with this, you could use to fsetpos to set the pointer at the index you're interested in and traverse. Thus, saving time in finding the area of data to change. You could even persist your index in an accompanying index file.

Note that you can write plain text to a binary file. Perhaps worth investigating?

Upvotes: 0

wmorrison365

Reputation: 6318

Well, couple of options, I guess:

You split your algorithm along sensible lines with this a defined output from a phase that can be the input to the next phase. Then, configure your algorithm as a workflow (ideally soft-configured through some declaration file.
You add logic to your algorithm by which it knows what it has successfully completed (commited). Then, on failure, you can restart the algorithm and it bins all uncommitted data and restarts from the last commit point.

Note that both these options may draw out your 7hr run time further!

So, to improve the overall runtime, could you also separate your algorithm so that it has "worker" components that can work on "jobs" in parallel. This usually means drawing out some "dumb" but intensive logic (such as a computation) that can be parameterised. Then, you have the option of running your algorithm on a grid/ space/ cloud/ whatever. At least you have options to reduce the run time. Doesn't even need to be a space... just use queues (IBM MQ Series has a C interface) and just have listeners on other boxes listening to your jobs queue and processing your results before persisting the results. You can still phase the algorithm as discussed above too.

Upvotes: 2

Ben M

Reputation: 22492

When each result phase is complete, branch off to a new universe. If the power fails in the new universe, destroy it and travel back in time to the point at which you branched. Repeat until all phases are finished, and then merge your results into the original universe via a transcendental wormhole.

Upvotes: 2

asaelr

Reputation: 5456

You can separate it in some source files, and use make.

Upvotes: 2

Execute Large C Program By Generating Intermediate Stages

Answers (5)

Related Questions