wcarhart
wcarhart

Reputation: 2783

How to parse CSV file with empty values in Octave?

I have the following CSV data that I am trying to parse in Octave. Note that the values in the last column are empty:

102,19700101,,0.485,
111,19700101,,0.48,

I have defined my line format as:

lineFormat = [repmat('%s',1,1), ...
         repmat('%f',1,1), ...
         repmat('%q',1,1), ...
         repmat('%f',1,1), ...
         repmat('%q',1,1)];

How can I read this in with textscan? When I try:

C = textscan(fid, lineFormat, 'Delimiter', ',')

I incorrectly get the following (notice that the second line from the CSV is shifted):

C = 
{
  [1,1] = 
  {
    [1,1] = 102
    [2,1] = 19700101
  }
  [1,2] =

     1.9700e+07
            NaN

  [1,3] = 
  {
    [1,1] = 
    [2,1] = 0.48
  }
  [1,4] =

       0.48500
     110.00000

  [1,5] = 
  {
    [1,1] = 111
    [2,1] = 19700101
  }
}

I've also tried with 'MultipleDelimsAsOne' but the last column value is still omitted. How do I read my CSV data in properly with textscan? This code works as expected in MATLAB, but not in Octave.

Running Octave 4.2.2 on Ubuntu 16.04.

Upvotes: 0

Views: 577

Answers (2)

wcarhart
wcarhart

Reputation: 2783

It appears this is a bug in Octave: https://savannah.gnu.org/bugs/index.php?57612

I got around this by adding an extra comma to the end of my CSV files whose lines ended in a comma. Since Octave ignores the final comma, adding a second comma causes Octave to not ignore the second-to-last one:

102,19700101,,0.485,,
111,19700101,,0.48,,

Here's a shell one-liner to fix all the CSV files in a directory:

find ${1:-.} -type f -name *.csv -exec sed -i -e 's/,$/,,/g' {} \;

This is not a great solution, just a work-around for the existing bug.

Upvotes: 0

HansHirse
HansHirse

Reputation: 18905

For your example, setting the EndOfLine parameter helped for me (Windows 10, Octave 5.1.0):

C = textscan(fid, lineFormat, 'Delimiter', ',', 'EndOfLine', '\n')

The output seems correct:

C =
{
  [1,1] =
  {
    [1,1] = 102
    [2,1] = 111
  }
  [1,2] =
     19700101
     19700101
  [1,3] =
  {
    [1,1] =
    [2,1] =
  }
  [1,4] =
     0.48500
     0.48000
  [1,5] =
  {
    [1,1] =
    [2,1] =
  }
}

Now I wanted to test your %q columns and expanded your example:

102,19700101,,0.485,
111,19700101,,0.48,
111,19700101,,0.48,"test"
111,19700101,"test",0.48,

Unfortunately, the above solution doesn't work properly here:

C =
{
  [1,1] =
  {
    [1,1] = 102
    [2,1] = 111
    [3,1] = 111
    [4,1] =
  }
  [1,2] =

     19700101
     19700101
     19700101
          111
  [1,3] =
  {
    [1,1] =
    [2,1] =
    [3,1] =
    [4,1] = 19700101
  }
  [1,4] =

     0.48500
     0.48000
     0.48000
  [1,5] =
  {
    [1,1] =
    [2,1] =
    [3,1] = test
  }
}

But, when switching from %q to %s in lineformat, it works as expected:

C =
{
  [1,1] =
  {
    [1,1] = 102
    [2,1] = 111
    [3,1] = 111
    [4,1] = 111
  }
  [1,2] =

     19700101
     19700101
     19700101
     19700101
  [1,3] =
  {
    [1,1] =
    [2,1] =
    [3,1] =
    [4,1] = "test"
  }
  [1,4] =
     0.48500
     0.48000
     0.48000
     0.48000
  [1,5] =
  {
    [1,1] =
    [2,1] =
    [3,1] = "test"
    [4,1] =
  }
}

I have no explanation for that; a bug maybe? If you can live with removing the double quotes yourself afterwards, this (still) might a solution for you.

Hope that helps!

Upvotes: 1

Related Questions