Sargera
Sargera

Reputation: 315

Position File Pointer to First Instance of a Value in a CSV File

Suppose we have a large CSV file formatted as

Unit, Date, Customer ID, Data_1, Data_2, ..., Data_n, Description

Unit, Date, Customer ID, Data_1, Data_2, ..., Data_n, Description

etc.

and we define variables dBegin = '2010-05-01'; and dEnd = '2011-05-01'; (say).

Is it possible to reposition (quickly) the file pointer to the beginning of the row corresponding to the first instance of dBegin in column 2?

The file I'm working with is sorted by date initially, so this would save a lot of time when extracting subsets by date range, rather than iterate through the file line-by-line and checking whether the entry falls within the indicated date range or not.

Upvotes: 0

Views: 114

Answers (1)

Andy
Andy

Reputation: 8091

I guess the length of each line is not constant which would make it impossible to use fseek to set the file pointer to the beginning of a line without reading it first (which would the make setting the filepointer useless).

You write in another question, that your input file is big and speed matters. In this case I would suggest using tools, which are designed to do fast string processing, to do the preprocessing (find start/end date, only use these rows).

I created an example input file.csv:

  5,2010-05-01, Customer ID1, DataA
  9,2011-05-02, Customer ID2, DataB
  1,2011-05-04, Customer ID3, DataC
  3,2011-05-06, Customer ID4, DataD
  8,2011-05-08, Customer ID5, DataE

and preprocess it with AWK (standard tool on GNU/Linux, for windoze see http://gnuwin32.sourceforge.net/packages/gawk.htm):

awk 'BEGIN{FS=","}$2~/2011-05-02/{f=1;}; f==1{print $0}; $2~/2011-05-06/{exit}' file.csv

this returns (I would also only print the needed columns)

9,2011-05-02, Customer ID2, DataB
1,2011-05-04, Customer ID3, DataC
3,2011-05-06, Customer ID4, DataD

And then use textread to import this reduced set.

If you have a concrete textfile with perhaps 50lines we could help better.

Upvotes: 1

Related Questions