Reputation: 490
I have like 40.000 groups after the code:
groups=data.groupby('A')
I need to subdived them like in sub-groups of 10.000, of course without overlapping and keeping the groupby stucture. Like group1=groups[0:10000], group2=groups[10000:20000]... to re-use them in other scripts. How can I do that?
Thank you !
Upvotes: 0
Views: 651
Reputation: 1102
in that case you can simply slice using iloc
group1=groups.iloc[0:10000,:]
group2=groups.iloc[10000:20000,:]
.
group3=groups.iloc[30000:40000,:]
this is when you want to slice according to indexes or number of rows required.
id you want to do it category wise then after performing group b you can simply do this
groups=groups.groupby(a).agg()
group1=groups.loc['category 1']
code mentioned in question aggregate not mentioned which is not valid refer the link to know how groupby works groupby
Upvotes: 1
Reputation: 571
Unless you're aggregating right afterwards, groupby might be an overkill for this task.
data = data.set_index('A')
group_idx = data.index.drop_duplicates()
sub_group_1 = data.loc[group_idx[:10000]]
will get you first 10000 groups
Upvotes: 1