Reputation: 315
Suppose we have a large CSV file formatted as
Unit, Date, Customer ID, Data_1, Data_2, ..., Data_n, Description
Unit, Date, Customer ID, Data_1, Data_2, ..., Data_n, Description
etc.
and we define variables dBegin = '2010-05-01';
and dEnd = '2011-05-01';
(say).
Is it possible to reposition (quickly) the file pointer to the beginning of the row corresponding to the first instance of dBegin in column 2?
The file I'm working with is sorted by date initially, so this would save a lot of time when extracting subsets by date range, rather than iterate through the file line-by-line and checking whether the entry falls within the indicated date range or not.
Upvotes: 0
Views: 114
Reputation: 8091
I guess the length of each line is not constant which would make it impossible to use fseek to set the file pointer to the beginning of a line without reading it first (which would the make setting the filepointer useless).
You write in another question, that your input file is big and speed matters. In this case I would suggest using tools, which are designed to do fast string processing, to do the preprocessing (find start/end date, only use these rows).
I created an example input file.csv:
5,2010-05-01, Customer ID1, DataA
9,2011-05-02, Customer ID2, DataB
1,2011-05-04, Customer ID3, DataC
3,2011-05-06, Customer ID4, DataD
8,2011-05-08, Customer ID5, DataE
and preprocess it with AWK (standard tool on GNU/Linux, for windoze see http://gnuwin32.sourceforge.net/packages/gawk.htm):
awk 'BEGIN{FS=","}$2~/2011-05-02/{f=1;}; f==1{print $0}; $2~/2011-05-06/{exit}' file.csv
this returns (I would also only print the needed columns)
9,2011-05-02, Customer ID2, DataB
1,2011-05-04, Customer ID3, DataC
3,2011-05-06, Customer ID4, DataD
And then use textread to import this reduced set.
If you have a concrete textfile with perhaps 50lines we could help better.
Upvotes: 1