mk1138
mk1138

Reputation: 55

Octave - dlmread and csvread convert the first value to zero

When I try to read a csv file in Octave I realize that the very first value from it is converted to zero. I tried both csvread and dlmread and I'm receiving no errors. I am able to open the file in a plain text editor and I can see the correct value there. From what I can tell, there are no funny hidden characters, spacings, or similar in the csv file. Files also contain only numbers. The only thing that I feel might be important is that I have five columns/groups that each have different number of values in them.

I went through the commands' documentation on Octave Forge and I do not know what may be causing this. Does anyone have an idea what I can troubleshoot?

To try to illustrate the issue, if I try to load a file with the contents:

1.1,2.1,3.1,4.1,5.1 
,2.2,3.2,4.2,5.2 
,2.3,3.3,4.3, 
,,3.4,4.4 
,,3.5,

Command window will return:

0.0,2.1,3.1,4.1,5.1 
,2.2,3.2,4.2,5.2 
,2.3,3.3,4.3, 
,,3.4,4.4 
,,3.5,

( with additional trailing zeros after the decimal point).

Command syntaxes I'm using are:

dt = csvread("FileName.csv")

and

dt = dlmread("FileName.csv",",")

and they both return the same.

Upvotes: 2

Views: 1491

Answers (1)

Tasos Papastylianou
Tasos Papastylianou

Reputation: 22225

Your csv file contains a Byte Order Mark right before the first number. You can confirm this if you open the file in a hex editor, you will see the sequence EF BB BF before the numbers start.

This causes the first entry to be interpreted as a 'string', and since strings are parsed based on whether there are numbers in 'front' of the string sequence, this is parsed as the number zero. (see also this answer for more details on how csv entries are parsed).

In my text editor, if I start at the top left of the file, and press the right arrow key once, you can tell that the cursor hasn't moved (meaning I've just gone over the invisible byte order mark, which takes no visible space). Pressing backspace at this point to delete the byte order mark allows the csv to be read properly. Alternatively, you may have to fix your file in a hex editor, or find some other way to convert it to a proper Ascii file (or UTF without the byte order mark).

Also, it may be worth checking how this file was produced; if you have any control in that process, perhaps you can find why this mark was placed in the first place and prevent it. E.g., if this was exported from Excel, you can choose plain 'csv' format instead of 'utf-8 csv'.

UPDATE

In fact, this issue seems to have already been submitted as a bug and fixed in the development branch of octave. See #58813 :)

Upvotes: 2

Related Questions