Reputation: 643
I have a DataFrame with an index called _ItemId and a _ChannelId column, plus other value columns simplified below as _SomeValue. I would like to transform the DataFrame to have exactly the same _ItemId elements for each _Channel modality (which I refer to as "balancing", like in panel data).
In the example below, in the df that I want, 3 rows were inserted with Nan values to balance the DataFrame.
I thought I would use the reindex() function but I am uncertain on how to create the index I need to insert the missing elements. Any suggestions much appreciated.
What I have:
_ChannelId _SomeValue
_ItemId
6559085 MICRO AA
6589413 MICRO AB
6589421 MICRO AA
6781144 MICRO AC
8184089 MICRO AA
6559085 WILSON AA
6589413 WILSON AC
What I want:
_ChannelId _SomeValue
_ItemId
6559085 MICRO AA
6589413 MICRO AB
6589421 MICRO AA
6781144 MICRO AC
8184089 MICRO AA
6559085 WILSON AA
6589413 WILSON AC
6589421 WILSON NaN
6781144 WILSON NaN
8184089 WILSON NaN
Thanks
Upvotes: 1
Views: 485
Reputation: 40973
First get the indexes of the group you want to take as reference, for example:
idx = df.groupby('_ChannelId').groups['MICRO']
May be you want the union, unique indexes, indexes of the first group, etc, it is not clearly specified in the question. Then you can do:
df.groupby('_ChannelId')\
.apply(lambda x: x.reindex(idx))\
.drop('_ChannelId', axis=1)\
.reset_index('_ChannelId')
_ChannelId _SomeValue
_ItemId
6559085 MICRO AA
6589413 MICRO AB
6589421 MICRO AA
6781144 MICRO AC
8184089 MICRO AA
6559085 WILSON AA
6589413 WILSON AC
6589421 WILSON NaN
6781144 WILSON NaN
8184089 WILSON NaN
Upvotes: 1