Multidimensional Input for Machine Learning

Question

I have a csv file with data in the the form of:

Timestamp,Signal_1,Signal_2,Signal_3,Signal_4,Signal_5
2021-04-13 11:03:13+02:00,3,3,3,12,12
2021-04-13 11:03:14+02:00,3,3,3,12,12

Now I want to create a NN to do time series forecasting, so in order to do that I wanted to turn the content into a numpy array so I can assign training/test-sets. The input aswell as the output should be 5-dimensional (all Signal Groups should be predicted). Currently my code looks like this:

import pandas
from matplotlib import pyplot
from sklearn.model_selection import train_test_split
from numpy import genfromtxt
filename = 'test.csv'
data = pandas.read_csv(filename , header=0, index_col=0)
my_data = genfromtxt('test.csv', delimiter=',')
print(data.shape)

print(type(my_data))
v, w, x, y, z = my_data

I am aware that the actual assignment of the test and training parts is missing, but even in this stage I get the error ValueError: too many values to unpack (expected 5)

Tim Jim · Accepted Answer

Not sure exactly which bit you would like to unpack (looks like you tried to import a version using pandas and one using numpy), but the error is because your my_data.shape = (3, 6), as the headers and timestamp column are not interpreted by np.genfromtxt, which causes the too many values to unpack error at v, w, x, y, z = my_data

array([[nan, nan, nan, nan, nan, nan],
       [nan,  3.,  3.,  3., 12., 12.],
       [nan,  3.,  3.,  3., 12., 12.]])

For the numpy my_data array, you could index to remove the first row and column and transpose to get it the right way up:

v, w, x, y, z = my_data[1:, 1:].T

Which will give you your 1D arrays:

>> v
array([3., 3.])

>> w
array([3., 3.])

>> x
array([3., 3.])

>> y
array([12., 12.])

>> z
array([12., 12.])

N.B. Just as an aside, if you try to do the same thing using your pandas dataframe data, i.e. v, w, x, y, z = data, you'll actually get the column header strings assigned, not the columns themselves. In this case, you want:

v, w, x, y, z = data.values.T

If you want the timestamp too, it's probably easier to use the pandas import as it handles mixed data more easily, just reset the index or remove index_col from your read_csv call:

data = pandas.read_csv(filename, header=0)
u, v, w, x, y, z = df.values.T

that will give you your timestamps in u.

Multidimensional Input for Machine Learning

Answers (1)

Related Questions