Simon Righley
Simon Righley

Reputation: 4969

python numpy structured array issue

I'm relatively new to numpy. I have imported data from .csv file with dates in fromat YYYY,MM,DD and some other stuff. I would like to put everything into one array, with dates in the "proper" datetime format. This is my code:

na_trades = np.zeros((number_of_orders,), dtype = ('datetime64,a5,a5,i4'))
for row in range(number_of_orders):
    order = na_trades_csv[row]
    order_date = dt.datetime(order[0],order[1],order[2])
    order_date64 =  np.datetime64(order_date)
    na_trades[row] = (order_date64,order[3],order[4],order[5])

But I'm getting error ValueError: error setting an array element with a sequence. Any idea as to why is that? Thanks for help in advance!

Upvotes: 1

Views: 722

Answers (2)

unutbu
unutbu

Reputation: 879093

Using numpy version 1.6.2, dtype = 'datetime64,a5,a5,i4' does not result in the intended dtype:

In [36]: na_trades = np.zeros((number_of_orders,), dtype = 'datetime64,a5,a5,i4')
In [37]: na_trades
Out[37]: array([1970-01-01 00:00:00], dtype=datetime64[us])

This looks like a bug to me -- though I could be wrong. Try instead:

na_trades = np.empty(number_of_orders,
                     dtype = [
                         ('dt', 'datetime64'),
                         ('foo','a5'),
                         ('bar', 'a5'),
                         ('baz', 'i4')])

Upvotes: 2

staticfloat
staticfloat

Reputation: 7040

This is because in numpy arrays (unlike python lists) you cannot assign a sequence to a single element in the array. Python arrays are nonhomogenous (e.g. different elements can be of different types) and don't really care what you throw into them, whereas Numpy arrays have a specfic type. You're trying to set the type to be a composite type (e.g. something with a datetime, two strings and an int) but numpy is ignoring everything after the datetime64 in your dtype string because your syntax is a little off.

Try the following:

z = np.zeros((5,), dtype = np.dtype([('time','datetime64'),('year','a5'),('month','a5'),('day','i4')]))

This creates a numpy.void type that acts like a dictionary. E.g. you can then do the following:

>>> z[0]
(datetime.datetime(1970, 1, 1, 0, 0), '', '', 0)

>>> z[0]['time']
1970-01-01 00:00:00

>>> z[0][0]
1970-01-01 00:00:00

Upvotes: 1

Related Questions