Stefano
Stefano

Reputation: 114

How to map numpy array in tensorflow dataset

I'm trying to load into my pipeline several files, each file contains 3 signals and the 3 signals are ordered in 10 minutes intervals. When i load the first file it has this shape (86, 75000,3). I'm using tensorflow 1.14

I have tried the following code, to make the code usable by you i simulate the loading with zeros:

import numpy as np
import tensorflow as tf


def my_func(x):
    p = np.zeros([86, 75000, 3])
    return p

def load_sign(path):
    sign = tf.compat.v1.numpy_function(my_func, [path], tf.float64)
    return sign

s = [1, 2]  # list with filenames, these are paths, here i simulate with numbers

AUTOTUNE = tf.data.experimental.AUTOTUNE  
ds = tf.data.Dataset.from_tensor_slices(s)
ds = ds.map(load_sign, num_parallel_calls=AUTOTUNE)

itera = tf.data.make_one_shot_iterator(ds)
x = itera.get_next()

with tf.Session() as sess:
    # sess.run(itera.initializer)
    va_sign = sess.run([x])
    va = np.array(va_sign)
    print(va.shape)

I get this shape: (1, 86, 75000, 3) While i would like to obtain 3 different variables each with this shape: (,75000)

How can i do it? I have also tried this code, but i get an error

import numpy as np
import tensorflow as tf


def my_func(x):
    p = np.zeros([86, 75000, 3])
    x = p[:,:,0]
    y = p[:, :, 1]
    z = p[:, :, 2]
    return x, y, z

# load the signals, in my example it creates the signals using zeros
def load_sign(path):
    a, b, c = tf.compat.v1.numpy_function(my_func, [path], tf.float64)
    return tf.data.Dataset.zip((a,b,c))

s = [1, 2]  # list with filenames, these are paths, here i simulate with numbers

AUTOTUNE = tf.data.experimental.AUTOTUNE  
ds = tf.data.Dataset.from_tensor_slices(s)
ds = ds.map(load_sign, num_parallel_calls=AUTOTUNE)

itera = tf.data.make_one_shot_iterator(ds)
x, y, z = itera.get_next()

with tf.Session() as sess:
    # sess.run(itera.initializer)
    va_sign = sess.run([x])
    va = np.array(va_sign)
    print(va.shape)

here i would expect a that x has this shape: (86, 75000), but instead i get this error. How can i make it work? And even better i can obtain an x with this shape (,75000)

TypeError: Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn.

Upvotes: 0

Views: 1587

Answers (1)

sebastian-sz
sebastian-sz

Reputation: 1498

The numpy_function:
a, b, c = tf.compat.v1.numpy_function(my_func, [path], tf.float64) should return a python function that can be used inside graph environment. The variables themselves are returned by my_func. So the following code should look like this:

def my_func(x):
    p = np.zeros([86, 75000, 3])
    x = p[:,:,0]
    y = p[:, :, 1]
    z = p[:, :, 2]
    return x, y, z

def load_sign(path):
    func = tf.compat.v1.numpy_function(my_func, [path], [tf.float64, tf.float64, tf.float64])
    return func

The rest is pretty much the same with minor tweaks:

s = [1, 2]  

AUTOTUNE = tf.data.experimental.AUTOTUNE  
ds = tf.data.Dataset.from_tensor_slices(s)
ds = ds.map(load_sign, num_parallel_calls=AUTOTUNE)

itera = tf.data.make_one_shot_iterator(ds)
output = itera.get_next() # Returns tuple of 3: x,y,z from my_func

with tf.Session() as sess:
    va_sign = sess.run([output])[0] # Unnest single-element list
    for entry in va_sign:
      print(entry.shape)

This will yield 3 elements, each of shape (86, 75000).

To further preprocess your data and reach (75000,) you could make use of tf.data.Dataset.unbatch():

AUTOTUNE = tf.data.experimental.AUTOTUNE  
ds = tf.data.Dataset.from_tensor_slices(s)
ds = ds.map(load_sign, num_parallel_calls=AUTOTUNE).unbatch()

itera = tf.data.make_one_shot_iterator(ds)
output = itera.get_next() # Returns tuple of 3: x,y,z from my_func

The same iteration, as above, will give you now three elements of size (75000,).

Upvotes: 1

Related Questions