Reputation: 2783
I have the following CSV data that I am trying to parse in Octave. Note that the values in the last column are empty:
102,19700101,,0.485,
111,19700101,,0.48,
I have defined my line format as:
lineFormat = [repmat('%s',1,1), ...
repmat('%f',1,1), ...
repmat('%q',1,1), ...
repmat('%f',1,1), ...
repmat('%q',1,1)];
How can I read this in with textscan
? When I try:
C = textscan(fid, lineFormat, 'Delimiter', ',')
I incorrectly get the following (notice that the second line from the CSV is shifted):
C =
{
[1,1] =
{
[1,1] = 102
[2,1] = 19700101
}
[1,2] =
1.9700e+07
NaN
[1,3] =
{
[1,1] =
[2,1] = 0.48
}
[1,4] =
0.48500
110.00000
[1,5] =
{
[1,1] = 111
[2,1] = 19700101
}
}
I've also tried with 'MultipleDelimsAsOne'
but the last column value is still omitted. How do I read my CSV data in properly with textscan
? This code works as expected in MATLAB, but not in Octave.
Running Octave 4.2.2 on Ubuntu 16.04.
Upvotes: 0
Views: 577
Reputation: 2783
It appears this is a bug in Octave: https://savannah.gnu.org/bugs/index.php?57612
I got around this by adding an extra comma to the end of my CSV files whose lines ended in a comma. Since Octave ignores the final comma, adding a second comma causes Octave to not ignore the second-to-last one:
102,19700101,,0.485,,
111,19700101,,0.48,,
Here's a shell one-liner to fix all the CSV files in a directory:
find ${1:-.} -type f -name *.csv -exec sed -i -e 's/,$/,,/g' {} \;
This is not a great solution, just a work-around for the existing bug.
Upvotes: 0
Reputation: 18905
For your example, setting the EndOfLine
parameter helped for me (Windows 10, Octave 5.1.0):
C = textscan(fid, lineFormat, 'Delimiter', ',', 'EndOfLine', '\n')
The output seems correct:
C =
{
[1,1] =
{
[1,1] = 102
[2,1] = 111
}
[1,2] =
19700101
19700101
[1,3] =
{
[1,1] =
[2,1] =
}
[1,4] =
0.48500
0.48000
[1,5] =
{
[1,1] =
[2,1] =
}
}
Now I wanted to test your %q
columns and expanded your example:
102,19700101,,0.485,
111,19700101,,0.48,
111,19700101,,0.48,"test"
111,19700101,"test",0.48,
Unfortunately, the above solution doesn't work properly here:
C =
{
[1,1] =
{
[1,1] = 102
[2,1] = 111
[3,1] = 111
[4,1] =
}
[1,2] =
19700101
19700101
19700101
111
[1,3] =
{
[1,1] =
[2,1] =
[3,1] =
[4,1] = 19700101
}
[1,4] =
0.48500
0.48000
0.48000
[1,5] =
{
[1,1] =
[2,1] =
[3,1] = test
}
}
But, when switching from %q
to %s
in lineformat
, it works as expected:
C =
{
[1,1] =
{
[1,1] = 102
[2,1] = 111
[3,1] = 111
[4,1] = 111
}
[1,2] =
19700101
19700101
19700101
19700101
[1,3] =
{
[1,1] =
[2,1] =
[3,1] =
[4,1] = "test"
}
[1,4] =
0.48500
0.48000
0.48000
0.48000
[1,5] =
{
[1,1] =
[2,1] =
[3,1] = "test"
[4,1] =
}
}
I have no explanation for that; a bug maybe? If you can live with removing the double quotes yourself afterwards, this (still) might a solution for you.
Hope that helps!
Upvotes: 1