Reputation: 3781

python- add col names to np.array

Why the following works:

mat = np.array(
    [(0,0,0),
     (0,0,0),
     (0,0,0)],
    dtype=[('MSFT','float'),('CSCO','float'),('GOOG','float') ]
    )

while this doesn't:

mat = np.array(
    [[0]*3]*3,
    dtype=[('MSFT','float'),('CSCO','float'),('GOOG','float')]
    )

TypeError: expected a readable buffer object

How can I create a matrix easily like

[[None]*M]*N

But with tuples in it to be able to assign names to columns?

Upvotes: 1

Answers (2)

hpaulj

Reputation: 231425

When I make an zero array with your dtype

In [548]: dt=np.dtype([('MSFT','float'),('CSCO','float'),('GOOG','float') ])

In [549]: A = np.zeros(3, dtype=dt)

In [550]: A
Out[550]: 
array([(0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0)], 
      dtype=[('MSFT', '<f8'), ('CSCO', '<f8'), ('GOOG', '<f8')])

notice that the display shows a list of tuples. That's intentional, to distinguish the dtype records from a row of a 2d (ordinary) array.

That also means that when creating the array, or assigning values, you also need to use a list of tuples.

For example let's make a list of lists:

In [554]: ll = np.arange(9).reshape(3,3).tolist()
In [555]: ll

In [556]: A[:]=ll
...
TypeError: a bytes-like object is required, not 'list'

but if I turn it into a list of tuples:

In [557]: llt = [tuple(i) for i in ll]

In [558]: llt
Out[558]: [(0, 1, 2), (3, 4, 5), (6, 7, 8)]

In [559]: A[:]=llt

In [560]: A
Out[560]: 
array([(0.0, 1.0, 2.0), (3.0, 4.0, 5.0), (6.0, 7.0, 8.0)], 
      dtype=[('MSFT', '<f8'), ('CSCO', '<f8'), ('GOOG', '<f8')])

assignment works fine. That list also can be used directly in array.

In [561]: np.array(llt, dtype=dt)
Out[561]: 
array([(0.0, 1.0, 2.0), (3.0, 4.0, 5.0), (6.0, 7.0, 8.0)], 
      dtype=[('MSFT', '<f8'), ('CSCO', '<f8'), ('GOOG', '<f8')])

Similarly assigning values to one record requires a tuple, not a list:

In [563]: A[0]=(10,12,14)

The other common way of setting values is on a field by field basis. That can be done with a list or array:

In [564]: A['MSFT']=[100,200,300]

In [565]: A
Out[565]: 
array([(100.0, 12.0, 14.0), (200.0, 4.0, 5.0), (300.0, 7.0, 8.0)], 
      dtype=[('MSFT', '<f8'), ('CSCO', '<f8'), ('GOOG', '<f8')])

The np.rec.fromarrays method recommended in the other answer ends up using the copy-by-fields approach. It's code is, in essence:

arrayList = [sb.asarray(x) for x in arrayList]
<determine shape>
<determine dtype>
_array = recarray(shape, descr)
# populate the record array (makes a copy)
for i in range(len(arrayList)):
    _array[_names[i]] = arrayList[i]

Upvotes: 4

mtzl

Reputation: 404

If you have a number of 1D arrays (columns) you would like to merge while keeping column names, you can use np.rec.fromarrays:

>>> dt = np.dtype([('a', float),('b', float),('c', float),])
>>> np.rec.fromarrays([[0] * 3 ] * 3, dtype=dt)
rec.array([(0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0)], dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])

This gives you a record/structured array in which columns can have names & different datatypes.

Upvotes: 3

python- add col names to np.array

Answers (2)

Related Questions