c++: lseek giving different values compered to the original file

Question

I'm trying to read a file that contain double formatted numbers in a matrix of 82503x1200. I'm reading the file but don't find the way to specify the correct size of the number that is being taken by lseek. Why is giving me that numbers instead of the file numbers?

float fd;
float ret;
float b;
const size_t NUM_ELEMS = 11;
const size_t NUM_BYTES = NUM_ELEMS * sizeof(float);

fd = open("signal_80k.txt",O_RDONLY);
if(fd < 0){
    perror("open");
    //exit(1);
}

ret = lseek(fd, seekCounter*NUM_BYTES, SEEK_SET);
ret = read(fd, &b, sizeof(float));
cout<<"> " << seekCounter << ": " << b<



it prints:


  0: 1.02564e-08
  1: 1.08604e-05
  2: 0.000174702
  3: 6.56482e-07
  4: 2.57894e-09


but the first values are: 
9.402433000000000e 
8.459109000000000e 
8.947654000000000e+03 
9.021620000000000e

This is how it looks in matlab

Sam Varshavchik · Accepted Answer

In your comments you clarified that the file contains text data, and my answer is based on that. Now, let's take a look at the first number in the file:

1.02564e-08

How many characters are there? I count 11 characters. Then, there's a space after it, so the next value after this one will be twelve characters after the first one.

By casual inspection, it appears that your code sets

 const size_t NUM_ELEMS = 11;

to be the number of values per row.

Then your code sets

 const size_t NUM_BYTES = NUM_ELEMS * sizeof(float);

To calculate the number of characters taken up by each row. Now, it's possible that I missed the actual meaning of these constants, but in any case, you have a target value in the file, and you're attempting to seek to it directly, that's the bottom line. So, for the purpose of this answer I'll go with this interpretation, but the answer's still the same, in any case.

Pop quiz for you. What is sizeof(float)?

Answer: it's 4 bytes, on most implementations (so I'll assume that going forward). So, you compute that there's going to be 44 characters per row, and you use that to attempt to seek to the appropriate line in the file. That's, at least, how I parsed your code.

The problem, of course, is that, assuming that each value is represented in scientific notation, with 11 values per line, and each value taking up 12 characters (including either a trailing space or a newline), each line will actually take 11 * 12 or 132 characters, and not 44. Add one more character if you're using an implementation O/S that uses for a new line.

So, you need to make some adjustments there. And even after that, this whole house of cards depends on each value in the file always being represented in scientific notation, with the same number of precisions.

Which is an assumption you can't really make. Furthermore, that's not the only problem here.

The second problem is you are attempting to read() the contents of the file directly into float datatypes. Yes, each float datatype will be four characters, because that's how many bytes it takes to represent a float value in binary. The problem here is that the file does not contain raw binary data, but text data.

In conclusion, I don't see much choice here but to read the file from start to finish, instead of attempting to seek to the right spot, since you have no guarantees that each value in the file will occupy the same number of characters; and then read the file as text, and convert its contents, using operator>>, to float values.

c++: lseek giving different values compered to the original file

Answers (2)

Related Questions