Reputation: 27258
In the sample installation and configuration instructions, it is seemingly suggested that OpenGrok requires two staging areas, with the rationale being, that one area is an index-regeneration-work-area, and the other is a production area, and they are rotated with every index regen.
Is that really necessary? Can I only have one area instead of two?
I'm looking for an answer that is specific to opengrok, and not a general list of race conditions one might encounter.
Upvotes: 1
Views: 233
Reputation: 315
Strictly said, this is not necessary. In fact, I am pretty sure overwhelming majority of the deployments are without staging area.
That said, you need to decide if you are comfortable with a window of inconsistency that could result in some failed/imprecise searches. Let's assume that the source was updated (e.g. via git pull
in case of Git) and the indexer has not finished processing the new changes yet. Thus, the index still contains the data reflecting the old state of the source. Let's say the changes applied to the source removed a file. Now if someone initiates a search that matches the contents of the removed file, the search result will probably end with an error. This is probably the better alternative - consider the case when more subtle change is done to a file such as removal/addition of couple of lines of code. In that case the symbol definitions will be off so the search results will bring you to the wrong line of code. Or, not so subtle change, when e.g. a function definition is removed from a file, the search results for references of this function will contain invalid places.
The length of the inconsistency window stems from the indexing time that is largely dependent on 2 things, at least currently:
The first is relevant because of history processing. The more incoming history changes (e.g. changesets in Git), the more work the indexer will have to do to generate history cache and/or history fields for the index (assuming history handling is on).
The second is relevant because the indexer traverses the whole source directory tree to find out which files have changed which might incur lots syscalls and potentially lots of I/O. At least until https://github.com/oracle/opengrok/issues/3077 is implemented and that will help only Source Code Management systems based on changesets.
Upvotes: 0