Reputation: 4398
I have a dataset that has several rows and columns, however within the column labeled, 'active', I wish to remove the aggregation and separate this into its own unique count. I also wish to add a column that gives this count a unique id.
Data
Pair gen box date active
sox black bl 2021 3
sox red re 2021 2
Desired
Pair gen box date active count
sox black bl 2021 1 b101
sox black bl 2021 1 bl02
sox black bl 2021 1 bl02
sox red re 2021 1 re01
sox red re 2021 1 re02
The 'active' column now has 3 and 2 distinct rows respectively, along with a distinct count ID (based on the 'box' column) instead of an aggregation.
Doing
A SO assisted with a similar dilemma, and I am taking this approach:
# Melt Table Into New Form
df = df.melt(col_level=0, value_name='count', var_name='bl')
# Repeat Based on Count
df = df.reindex(df.index.repeat(df['count']))
# Set Count To 1
df['count'] = 1
# Add Suffix to Each ID
df['ID'] = df['ID'] + (
(df.groupby('ID').cumcount() + 1)
.astype(str)
.str.zfill(2)
)
This works, however, I am not able to retain the other columns as well as set individual var names such as:
bl01
bl02
re01
re02
Any suggestion or advice is appreciated
Upvotes: 2
Views: 71
Reputation: 23099
another method would be a concat along your axis, then use cumcount to create your count column.
dfs = pd.concat([pd.concat([y.assign(active=1,)] * act)
for (idx,act),y in df.groupby([df.index, df['active']])])
dfs['count'] = dfs['box'] + (dfs.groupby('box').cumcount() + 1).astype(str).str.zfill(2)
print(dfs)
Pair gen box date active count
0 sox black bl 2021 1 bl01
0 sox black bl 2021 1 bl02
0 sox black bl 2021 1 bl03
1 sox red re 2021 1 re01
1 sox red re 2021 1 re02
Upvotes: 1
Reputation: 16683
You don't need melt
for this problem and can reindex
and repeat
with the following. Keep in mind that for the count
column, I realize that if each group has a size > 10
that it would show up as bl010
as an example, when you really want bl10
. The number formatting works with the provided sample data though with group size less than 10, and with group size > 10, it is not clear what the desired output is:
df = df.reindex(df.index.repeat(df['active'])).assign(active=1)
df['count'] = df['box'] + '0' + (df.groupby(['Pair', 'gen', 'box']).cumcount() + 1).astype(str)
df
Out[1]:
Pair gen box date active count
0 sox black bl 2021 1 bl01
0 sox black bl 2021 1 bl02
0 sox black bl 2021 1 bl03
1 sox red re 2021 1 re01
1 sox red re 2021 1 re02
Upvotes: 4