Tensorflow: Passing CSV with 3D feature array

Question

My current text file that I intend to use for LSTM training in Tensorflow looks like this:

> 0.2, 4.3, 1.2
> 1.1, 2.2, 3.1
> 3.5, 4.1, 1.1, 4300
> 
> 1.2, 3.3, 1.2
> 1.5, 2.4, 3.1
> 3.5, 2.1, 1.1, 4400
> 
> ...

There are 3 sequences 3 features vectors with only 1 label for each sample. I formatted this text file so it can be consistent with the LSTM training as the latter requires a time-steps of the sequences or in general, LSTM training requires a 3D tensor (batch, num of time-steps, num of features).

My question: How should I use Numpy or TensorFlow.TextReader in order to reformat the 3x3 sequence vectors and the singleton Labels so it can become compatible with Tensorflow?

Edit: I saw many tutorials on reformatting text or CSV files that has vectors and labels but unfortunately they were for 1 to 1 relationships e.g.

0.2, 4.3, 1.2, Class1
1.1, 2.2, 3.1, Class2
3.5, 4.1, 1.1, Class3

becomes:

[0.2, 4.3, 1.2, Class1], [1.1, 2.2, 3.1, Class2], [3.5, 4.1, 1.1, Class3]

which clearly is readable by Numpy and can build vectors easily from it dedicated for simple Feed-Forward NN tasks. But this procedure doesn't actually build an LSTM friendly CSV.

EDIT: The TensorFlow tutorial on CSV formats, covers only 2D arrays as an example. The features = col1, col2, col3 doesn't assume that there might be time-steps for each sequence array and hence my question.

hpaulj · Accepted Answer

I'm a little confused as to whether you are more interested in the numpy array(s) structure, or the csv fomat.

The np.savetxt csv file writer can't readily produce text like:

0.2, 4.3, 1.2
1.1, 2.2, 3.1
3.5, 4.1, 1.1, 4300

1.2, 3.3, 1.2
1.5, 2.4, 3.1
3.5, 2.1, 1.1, 4400

savetxt is not tricky. It opens a file for writing, and then iterates on the input array, writing it, one row at a time to the file. Effectively:

 for row in arr:
    f.write(fmt % tuple(row))

where fmt has a % field for each element of the the row. In the simple case it constructs fmt = delimiter.join(['fmt']*(arr.shape[1])). In other words repeating the simgle field fmt for the number of columns. Or you can give it a multifield fmt.

So you could use normal line/file writing methods to write a custom display. The simplest is to construct it using the usual print commends, and then redirect those to a file.

But having done that, there's the question of how to read that back into a numpy session. np.genfromtxt can handle missing data, but you still have to include the delimiters. It's also trickier to have it read blocks (3 lines separated by a blank line). It's not impossible, but you have to do some preprocessing.

Of course genfromtxt isn't that tricky either. It reads the file line by line, converts each line into a list of numbers or strings, and collects those lists in a master list. Only at the end is that list converted into an array.

I can construct an array like your text with:

In [121]: dt = np.dtype([('lbl',int), ('block', float, (3,3))])
In [122]: A = np.zeros((2,),dtype=dt)
In [123]: A
Out[123]: 
array([(0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]),
       (0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]])], 
      dtype=[('lbl', '



I can load it from a txt that has all the block values flattened:

In [130]: txt=b"""4300, 0.2, 4.3, 1.2, 1.1, 2.2, 3.1, 3.5, 4.1, 1.1"""
In [131]: txt
Out[131]: b'4300, 0.2, 4.3, 1.2, 1.1, 2.2, 3.1, 3.5, 4.1, 1.1'


genfromtxt can handle a complex dtype, allocating values in order from the flat line list:

In [133]: data=np.genfromtxt([txt],delimiter=',',dtype=dt)
In [134]: data['lbl']
Out[134]: array(4300)
In [135]: data['block']
Out[135]: 
array([[ 0.2,  4.3,  1.2],
       [ 1.1,  2.2,  3.1],
       [ 3.5,  4.1,  1.1]])


I'm not sure about writing it.  I have have to reshape it into a 10 column  or field array, if I want to use savetxt.

Tensorflow: Passing CSV with 3D feature array

Answers (2)

Related Questions