Reputation: 96264
Say I have some random data frame:
> df
A B C D
0 foo one 1.344866 -0.602697
1 bar one 0.669491 -0.264758
2 foo two 0.830100 0.381644
3 bar three -0.756694 -0.382337
4 foo two -0.267778 0.963123
5 bar two 1.275177 -0.667924
6 foo one 0.240863 0.321022
7 foo three -1.431863 -0.333058
And I partition it according to:
groups =df.groupby(['A', 'B'])
What is the difference between the following two methods? They return group information in different formats.
for key, value in groups:
print key
print value
and
for group_ix in xrange(groups.ngroups)
item = groups.nth(group_ix)
?
Upvotes: 2
Views: 94
Reputation: 375435
These two things are quite different, nth
takes the nth value in the group (currently with NaNs if the group has fewer than n items):
In [11]: groups.nth(n=0) # the 0th items in each group
Out[11]:
C D
A B
bar one 0.669491 -0.264758
three -0.756694 -0.382337
two 1.275177 -0.667924
foo one 1.344866 -0.602697
three -1.431863 -0.333058
two 0.830100 0.381644
In [12]: groups.nth(n=1) # the 1st items in each group, NaNs if <=1
Out[12]:
C D
A B
bar one NaN NaN
three NaN NaN
two NaN NaN
foo one 0.240863 0.321022
three NaN NaN
two -0.267778 0.963123
Note: atm this isn't particularly well documented, there is an open issue to change that and tweak the behaviour of nth with a Series groupby (to be cumcount() == n
).
When you iterate over groups, you get the keys (the mi) and the values (the subDataFrame for each group):
In [21]: for k, v in groups: print k # the v are subDataFrames for each item
('bar', 'one')
('bar', 'three')
('bar', 'two')
('foo', 'one')
('foo', 'three')
('foo', 'two')
In [22]: groups.get_group(('foo' , 'one')) # example v
Out[22]:
A B C D
0 foo one 1.344866 -0.602697
6 foo one 0.240863 0.321022
Upvotes: 3