VictorGram
VictorGram

Reputation: 2661

How to delete a record and keep on reading a file?

I have to read a sequential file which has over a million of records. I have to read each line/record and have to delete that record/line from the file and keep on reading.

Not finding any example on how to do that without using temporary file or creating/recreating a new file of the same name.

These are text files. Each file is about .5 GB big and we have over a million lines/records in each file.

Currently we are copying all the records to memory as we do not want to re-process any record if any thing happens in the middle of the processing of a file.

Upvotes: 0

Views: 1152

Answers (4)

DJClayworth
DJClayworth

Reputation: 26856

Assuming that the file in question is a simple sequential file - you can't. In the Java file model, deleting part of a file implies deleting all of it after the deletion point.

Some alternative approaches are:

  • In your process copy the file, omitting the parts you want deleted. This is the normal way of doing this.
  • Overwrite the parts of the file you want deleted with some value that you know never occurs in the file, and then at a later date copy the file, removing the marked parts.
  • Store the entire file in memory, edit it as required, and write it again. Just because you have a million records doesn't make that impossible. If your files are 0.5GB, as you say, then this approach is almost certainly viable.
  • Each time you delete some record, copy all of the contents of the file after the deletion to its new position. This will be incredibly inefficient and error-prone.

Unless you can store the file in memory, using a temporary file is the most efficient. That's why everyone does it.

If this is some kind of database, then that's an entirely different question.

EDIT: Since I answered this. comments have indicated that what the user wants to do is use deletion to keep track of which records have already been processed. If that is the case, there are much simpler ways of doing this. One good way is to write a file which just contains a count of how many bytes (or records) of the file have been processed. If the processor crashes, update the file by deleting the records that have been processed and start again.

Upvotes: 4

Iazel
Iazel

Reputation: 2336

Why not a simple sed -si '/line I want to delete/d' big_file?

Upvotes: 0

DwB
DwB

Reputation: 38300

Files are unstructured streams of bytes; there is no record structure. You can not "delete" a "line" from an unstructured stream of bytes.

The basic algorithm you need to use is this:

  1. create temporary file.
  2. open input file
  3. if at the end of the file, goto line 7
  4. read a line from the input file
  5. if the line is not to be deleted, write it to the temporary file
  6. goto line 3
  7. close the input file.
  8. close the temporary file.
  9. delete (or just rename) the input file.
  10. rename (or move) the temporary file to have the original name of the input file.

Upvotes: 1

Anton
Anton

Reputation: 559

There is a similar question asked, "Java - Find a line in a file and remove".

Basically they all use a temp file, there is no harm doing so. So why not just do it? It will not affect your performance much and can avoid some errors.

Upvotes: 0

Related Questions