codervince
codervince

Reputation: 425

Numpy genfromtxt iterate over columns

I am using NumPy's genfromtext to get columns from a CSV file.

Each column needs to be split and assigned to a separate SQLAlchemy SystemRecord combined with some other columns and attributes and added to the DB.

Whats the best practice to iterate over the columns f1 to f9 and add them to the session object?

So far, I have used the following code but I don't want to do the same thing for each f column:

t = np.genfromtxt(FILE_NAME,dtype=[(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20), (np.str_, 20), (np.str_, 20),(np.str_, 20)]\
 ,delimiter=',',filling_values="None", skiprows=0,usecols=(0,1,2,3,4,5,6,7,8,9,10))

for r in enumerate(t):
    _acol = r['f1'].split('-')
    _bcol = r['f2'].split('-')
    ....
    arec = t_SystemRecords(first=_acol[0], second=_acol[1], third=_acol[2], ... )
    db.session.add(arec)
    db.session.commit()

Upvotes: 3

Views: 1339

Answers (1)

hpaulj
hpaulj

Reputation: 231665

Look at t.dtype. Or the r.dtype.

Make a sample structured array (which is what genfromtxt returns):

t = np.ones((5,), dtype='i4,i4,f8,S3')

which looks like:

array([(1, 1, 1.0, b'1'), (1, 1, 1.0, b'1'), (1, 1, 1.0, b'1'),
       (1, 1, 1.0, b'1'), (1, 1, 1.0, b'1')], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', 'S3')])

the dtype and dtype.names are:

In [135]: t.dtype
Out[135]: dtype([('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', 'S3')])

In [138]: t.dtype.names
Out[138]: ('f0', 'f1', 'f2', 'f3')

iterate over the names to see the individual columns:

In [139]: for n in t.dtype.names:
   .....:     print(t[n])
   .....:     
[1 1 1 1 1]
[1 1 1 1 1]
[ 1.  1.  1.  1.  1.]
[b'1' b'1' b'1' b'1' b'1']

Or in your case, iterate over the 'rows', and then iterate over the names:

In [140]: for i,r in enumerate(t):
   .....:     print(r)
   .....:     for n in r.dtype.names:
   .....:         print(r[n])
   .....:         
(1, 1, 1.0, b'1')
1
1
1.0
b'1'
(1, 1, 1.0, b'1')
...

For r, which is 0d (check r.shape), you can select items by number or iterate

r[1]  # == r[r.dtype.names[1]]
for i in r: print(r)

For t which is 1d this does not work; t[1] references an item.

1d structured arrays behave a bit like 2d arrays, but not quite. The usual talk of row and column has to be replaced with row (or item) and field.


To make a t that might be closer to your case

In [175]: txt=[b'one-1, two-23, three-12',b'four-ab, five-ss, six-ss']

In [176]: t=np.genfromtxt(txt,dtype=[(np.str_,20),(np.str_,20),(np.str_,20)])

In [177]: t
Out[177]: 
array([('one-1,', 'two-23,', 'three-12'),
       ('four-ab,', 'five-ss,', 'six-ss')], 
      dtype=[('f0', '<U20'), ('f1', '<U20'), ('f2', '<U20')])

np.char has string functions that can be applied to an array:

In [178]: np.char.split(t['f0'],'-')
Out[178]: array([['one', '1,'], ['four', 'ab,']], dtype=object)

It doesn't work on the structured array, but does work on the individual fields. That output could be indexed as a list of lists (it's not 2d).

Upvotes: 2

Related Questions