Reputation: 425
I am using NumPy
's genfromtext
to get columns from a CSV file.
Each column needs to be split and assigned to a separate SQLAlchemy
SystemRecord
combined with some other columns and attributes and added to the DB.
Whats the best practice to iterate over the columns f1
to f9
and add them to the session object?
So far, I have used the following code but I don't want to do the same thing for each f
column:
t = np.genfromtxt(FILE_NAME,dtype=[(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20), (np.str_, 20), (np.str_, 20),(np.str_, 20)]\
,delimiter=',',filling_values="None", skiprows=0,usecols=(0,1,2,3,4,5,6,7,8,9,10))
for r in enumerate(t):
_acol = r['f1'].split('-')
_bcol = r['f2'].split('-')
....
arec = t_SystemRecords(first=_acol[0], second=_acol[1], third=_acol[2], ... )
db.session.add(arec)
db.session.commit()
Upvotes: 3
Views: 1339
Reputation: 231665
Look at t.dtype
. Or the r.dtype
.
Make a sample structured array (which is what genfromtxt returns):
t = np.ones((5,), dtype='i4,i4,f8,S3')
which looks like:
array([(1, 1, 1.0, b'1'), (1, 1, 1.0, b'1'), (1, 1, 1.0, b'1'),
(1, 1, 1.0, b'1'), (1, 1, 1.0, b'1')],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', 'S3')])
the dtype
and dtype.names
are:
In [135]: t.dtype
Out[135]: dtype([('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', 'S3')])
In [138]: t.dtype.names
Out[138]: ('f0', 'f1', 'f2', 'f3')
iterate over the names to see the individual columns:
In [139]: for n in t.dtype.names:
.....: print(t[n])
.....:
[1 1 1 1 1]
[1 1 1 1 1]
[ 1. 1. 1. 1. 1.]
[b'1' b'1' b'1' b'1' b'1']
Or in your case, iterate over the 'rows', and then iterate over the names:
In [140]: for i,r in enumerate(t):
.....: print(r)
.....: for n in r.dtype.names:
.....: print(r[n])
.....:
(1, 1, 1.0, b'1')
1
1
1.0
b'1'
(1, 1, 1.0, b'1')
...
For r
, which is 0d (check r.shape
), you can select items by number or iterate
r[1] # == r[r.dtype.names[1]]
for i in r: print(r)
For t
which is 1d this does not work; t[1]
references an item.
1d structured arrays behave a bit like 2d arrays, but not quite. The usual talk of row
and column
has to be replaced with row
(or item) and field
.
To make a t
that might be closer to your case
In [175]: txt=[b'one-1, two-23, three-12',b'four-ab, five-ss, six-ss']
In [176]: t=np.genfromtxt(txt,dtype=[(np.str_,20),(np.str_,20),(np.str_,20)])
In [177]: t
Out[177]:
array([('one-1,', 'two-23,', 'three-12'),
('four-ab,', 'five-ss,', 'six-ss')],
dtype=[('f0', '<U20'), ('f1', '<U20'), ('f2', '<U20')])
np.char
has string functions that can be applied to an array:
In [178]: np.char.split(t['f0'],'-')
Out[178]: array([['one', '1,'], ['four', 'ab,']], dtype=object)
It doesn't work on the structured array, but does work on the individual fields. That output could be indexed as a list of lists (it's not 2d).
Upvotes: 2