Jakub M.
Jakub M.

Reputation: 33857

numpy, named columns

Simple question about numpy:

I load 100 values to a vector a. From this vector, I want to create an array A with 2 columns, where one column has name "C1" and second one "C2", one has type int32 and another int64. An example:

a = range(100)
A = array(a).reshape( len(a)/2, 2)
# A.dtype = ...?

How to define the columns' types and names, when I create array from a?

Upvotes: 26

Views: 39822

Answers (2)

unutbu
unutbu

Reputation: 880399

NumPy structured arrays have named columns:

import numpy as np
    
a = range(100)
A = np.array(list(zip(*[iter(a)] * 2)), dtype=[('C1', 'int32'),('C2', 'int64')])
print(A.dtype)
[('C1', '<i4'), ('C2', '<i8')]

You can access the columns by name like this:

print(A['C1'])
# [ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48
#  50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98]

Note that using np.array with zip causes NumPy to build an array from a temporary list of tuples. Python lists of tuples use a lot more memory than equivalent NumPy arrays. So if your array is very large you may not want to use zip.

Instead, given a NumPy array A, you could use ravel() to make A a 1D array, and then use view to turn it into a structured array, and then use astype to convert the columns to the desired type:

a = range(100)
A = np.array(a).reshape( len(a)//2, 2)
A = A.ravel().view([('col1','i8'),('col2','i8'),]).astype([('col1','i4'),('col2','i8'),])
print(A[:5])
# array([(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)], 
#       dtype=[('col1', '<i4'), ('col2', '<i8')])

print(A.dtype)
# dtype([('col1', '<i4'), ('col2', '<i8')])

Upvotes: 24

user2428107
user2428107

Reputation: 3243

I know this is an old question, but a more recently available option would be to try using pandas. The DataFrame type is designed for structured data like this, where columns are named and can be of different types.

Upvotes: 11

Related Questions