Reputation: 81
I have a .CSV file containing 100 000 records. I need to parse through a set of records and then delete it. Then again parse the next set of records till the end. How to do it? A code snippet will be very helpful.
I tried but I am not able to delete the records and reuse the same CSV file left with remaining set of records.
Upvotes: 0
Views: 3852
Reputation: 133
Below code is tested working fine, you can erase any line in existing csv file using below code, so please check and let me know, you will have to put row number in array to delete,
File f=new File(System.getProperty("user.home")+"/Desktop/c.csv");
RandomAccessFile ra=new RandomAccessFile(f,"rw");
ra.seek(0);
long p=ra.getFilePointer();
byte b[]=ra.readLine().getBytes();
char c=' ';//44 for comma 32 for white space
for(int i=0;i<b.length;i++){
if(b[i]!=44){//Replace all except comma
b[i]=32;
}
}
ra.seek(p);//Go to intial pointer of line
ra.write(b);//write blank line with commas as column separators
ra.close();
Upvotes: 0
Reputation: 119
You cannot edit or delete the existing data of a file. Ideally you should generate a new file for your output. In your case, once you reach the point to delete the existing data, you can create a new file, copy the remaining lines to the file and use this new file as input code:
File infile =new File("C:\\MyInputFile.txt");
File outfile =new File("C:\\MyOutputFile.txt");
instream = new FileInputStream(infile);
outstream = new FileOutputStream(outfile);
byte[] buffer = new byte[1024];
int length;
/*copying the contents from input stream to
* output stream using read and write methods
*/
while ((length = instream.read(buffer)) > 0){
outstream.write(buffer, 0, length);
}
//Closing the input/output file streams
instream.close();
outstream.close();
Upvotes: 0
Reputation: 70564
This can not be done efficiently, since CSV is a sequential file format. Say you have
"some text", "adsf"
"more text", "adfgagqwe"
"even more text", "adsfasdf"
...
and you want to remove the second line:
"some text", "adsf"
"even more text", "adsfasdf"
...
you need to move up all subsequent lines (which in your case can be 100 000 ...), which involves reading them at their old location and writing them to the new one. That is, deleting the first of 100 000 lines involves reading and writing 99 999 lines of text, which will take a while ...
It is therefore worthwhile to consider alternatives. For instance, if you are trying to process a file, and want to keep track of how far you got, it is far more efficient store the line number (or offset in bytes) you were at, and leave the input file intact. This will also prevent corrupting the file if your program crashes while deleting the lines. Another approach is to first split the file into many small files (perhaps 1000 lines each), process each file in its entirety and then delete the file.
However, if you truly must delete lines from a CSV file, the most robust way is to read the entire file, write all records you want to keep to a new file, delete the original file, and finally rename the new file to the original file.
Upvotes: 1