Reputation: 133
I have a large array that I imported from a csv (np.recfromcsv
) that I want to divide into smaller arrays by an ID column in said array.
For example my array(a) looks like:
[(842, 129826, 2018, 7246, '1/4/2009', 452, '1/4/2009', 452, '1/4/2009')
(863, 129827, 2018, 7246, '1/7/2009', 452, '1/7/2009', 452, '1/7/2009')
(890, 129828, 2019, 7246, '1/11/2009', 452, '1/11/2009', 452, '1/11/2009')
...,
(339, 131268, 1085, 4211, '12/1/2009', 220, '12/2/2009', 220, '12/1/2009')
(376, 131535, 1085, 4211, '12/8/2009', 220, '12/9/2009', 220, '12/8/2009')
(470, 131536, 1087, 4211, '12/28/2009', 220, '12/29/2009', 220, '12/28/2009')]
And I would like to split this into arrays based on the third column (2018, 2019, 1085, etc). I've been trying to find a way to use numpy's vsplit method using a list I generated of unique ID values (id_list = list(set(a['id']))
), however I get the erorr: ValueError: vsplit only works on arrays of 2 or more dimensions. Which makes me think the np.recfromcsv
tool doesn't generate dimensions properly. Should I be using a different import tool?
I have also tried doing this in a simple loop:
for e in id_list:
name = "id" + str(e)
name = a[a['id']==e]
But this generates an error: SyntaxError: can't assign to operator. I know the problem is the dynamic variable, but I see no other way to achieve this without overwriting the array for each ID.
I'd really appreciate advice on how to figure this out.
Upvotes: 3
Views: 3226
Reputation: 58885
To read a column from a recarray you do not pass the index, but the name, for example:
my_col = a['id']
So that your command will be:
id_list = list(set(a['id'])))
Just as an observation.
The recfromcsv()
works properly. Each field in the structured array (or record array) works like a 1D-array
. Maybe you could try using np.loadtxt()
passing delimiter=','
, which will return a 2D-array
.
Upvotes: 1