Reputation: 27
I have searched through the other questions but none of them addressed this problem. The focus of this issue is to manipulate the groups directly.
Let's assume I have the following data frame:
A B C Bg
0 1 X 1 None
1 2 A 7 None
2 3 X 9 1
3 4 X 1 1
4 5 B 1 None
5 6 X 0 None
6 7 C 8 None
7 8 A 5 None
8 9 X 9 2
9 10 X 4 2
10 11 X 2 2
11 12 A 4 None
It is then grouped by 'Bg' column:
groups = df2.groupby('Bg')
for name, group in groups:
print('name:', name, '\n', group, '\n\n')
the groups will be like this:
name: 1
A B C Bg
2 3 X 9 1
3 4 X 1 1
name: 2
A B C Bg
8 9 X 9 2
9 10 X 4 2
10 11 X 2 2
I wrote the following code to perform some tasks and manipulate the groups:
groups3 = copy.deepcopy(groups)
for name, group in groups3:
idx_first = group.index[0]
idx_last = group.index[-1]
if name == 2:
groups3.groups[name] = np.delete(groups3.groups[name], range(0, 1), axis=0)
else:
del groups3.groups[name]
print('groups', groups3.groups)
print('-------')
for name, group in groups3:
print(group)
and the output is:
groups {2: Int64Index([9, 10], dtype='int64')}
-------
A B C Bg
2 3 X 9 1
3 4 X 1 1
A B C Bg
8 9 X 9 2
9 10 X 4 2
10 11 X 2 2
However, I'm expecting this in the output:
groups {2: Int64Index([9, 10], dtype='int64')}
-------
A B C Bg
9 10 X 4 2
10 11 X 2 2
Upvotes: 2
Views: 881
Reputation: 294546
This is a serious messy rabbit hole...
Short Story
The iteration through a groupby object isn't controlled by iterating through the dictionary returned by groups
It starts with def __iter__
def __iter__(self):
"""
Groupby iterator
Returns
-------
Generator yielding sequence of (name, subsetted object)
for each group
"""
return self.grouper.get_iterator(self.obj, axis=self.axis)
Then to def get_iterator
def get_iterator(self, data, axis=0):
"""
Groupby iterator
Returns
-------
Generator yielding sequence of (name, subsetted object)
for each group
"""
splitter = self._get_splitter(data, axis=axis)
keys = self._get_group_keys()
for key, (i, group) in zip(keys, splitter):
yield key, group
Which references _get_splitter
and _get_group_keys
In both of these, we see group_info
which returns an obscure and well protected tuple of things that control the iteration. I couldn't figure out how to completely control the iteration but I could mess it up.
a, b, c = groups3.grouper.group_info
a[a==1] = -1
for name, group in groups3:
print(group)
A B C Bg
2 3 X 9 1
3 4 X 1 1
Empty DataFrame
Columns: [A, B, C, Bg]
Index: []
My advice... Don't Do This!
Option 1
filter
then groupby
again
df2.groupby('Bg').filter(lambda x: x.name != '2').groupby('Bg')
Option 2
dictionary comprehension
{name: group for name, group in groups3 if name != '2'}
Upvotes: 3