Split numpy array by unique values in column

Question

I have a large array that I imported from a csv (np.recfromcsv) that I want to divide into smaller arrays by an ID column in said array. For example my array(a) looks like:

[(842, 129826, 2018, 7246, '1/4/2009', 452, '1/4/2009', 452, '1/4/2009')
 (863, 129827, 2018, 7246, '1/7/2009', 452, '1/7/2009', 452, '1/7/2009')
 (890, 129828, 2019, 7246, '1/11/2009', 452, '1/11/2009', 452, '1/11/2009')
 ...,
 (339, 131268, 1085, 4211, '12/1/2009', 220, '12/2/2009', 220, '12/1/2009')
 (376, 131535, 1085, 4211, '12/8/2009', 220, '12/9/2009', 220, '12/8/2009')
 (470, 131536, 1087, 4211, '12/28/2009', 220, '12/29/2009', 220, '12/28/2009')]

And I would like to split this into arrays based on the third column (2018, 2019, 1085, etc). I've been trying to find a way to use numpy's vsplit method using a list I generated of unique ID values (id_list = list(set(a['id']))), however I get the erorr: ValueError: vsplit only works on arrays of 2 or more dimensions. Which makes me think the np.recfromcsv tool doesn't generate dimensions properly. Should I be using a different import tool?
I have also tried doing this in a simple loop:

for e in id_list:
    name = "id" + str(e)
    name = a[a['id']==e]

But this generates an error: SyntaxError: can't assign to operator. I know the problem is the dynamic variable, but I see no other way to achieve this without overwriting the array for each ID.

I'd really appreciate advice on how to figure this out.

Split numpy array by unique values in column

Answers (1)

Related Questions