Yash
Yash

Reputation: 81

Group By of Dataframe by Index and value in python

This is similar to what i have, except there are many more columns and rows to the dataframe.

df =

   1  2  3
A  0  0  1
B  1  0  0
C  0  1  0
D  0  1  0
E  1  0  0
F  0  1  0
G  0  0  1

My Desired Output:

result =

   1  2  3
0  B  C  A
1  E  D  G
2     F 

What is best and fastest way i can accomplish this? I cannot possibly go through each row to and iterate through it as the actual dataframe has 1000s of rows and columns.

Thank you in advance.

Upvotes: 1

Views: 86

Answers (2)

tozCSS
tozCSS

Reputation: 6114

This is a single line answer -- no need to groupby, cumsum, transform, melt, pivot, etc. like in the accepted answer. For the sake of clarity I'll put it into two lines:

res = df.mask(df.astype(bool),df.index).replace(0,np.nan)\
      .apply(lambda x: x.dropna().reset_index(drop=True))

The first line above creates the res dataframe by replacing 1s with the corresponding index value and replaces 0s with NaNs:

     1    2    3
A  NaN  NaN    A
B    B  NaN  NaN
C  NaN    C  NaN
D  NaN    D  NaN
E    E  NaN  NaN
F  NaN    F  NaN
G  NaN  NaN    G

The second line drops the NaNs in each column, resets each column's index, and combines them back:

     1  2    3
0    B  C    A
1    E  D    G
2  NaN  F  NaN

The full code below for others to easily replicate:

from io import StringIO
import numpy as np

df ="""\
   1  2  3
A  0  0  1
B  1  0  0
C  0  1  0
D  0  1  0
E  1  0  0
F  0  1  0
G  0  0  1
"""
df = pd.read_csv(StringIO(df),sep='\s+')

res = df.mask(df.astype(bool),df.index).replace(0,np.nan)
res.apply(lambda x: x.dropna().reset_index(drop=True))
     1  2    3
0    B  C    A
1    E  D    G
2  NaN  F  NaN

Upvotes: 1

Andreas
Andreas

Reputation: 9197

You can combine pd.melt and pd.groupby to achive that:

Sample data:

import pandas as pd
d = {'1': {'A': 0, 'B': 1, 'C': 0, 'D': 0, 'E': 1, 'F': 0, 'G': 0},
 '2': {'A': 0, 'B': 0, 'C': 1, 'D': 1, 'E': 0, 'F': 1, 'G': 0},
 '3': {'A': 1, 'B': 0, 'C': 0, 'D': 0, 'E': 0, 'F': 0, 'G': 1}}
df = pd.DataFrame(d).reset_index()

Code:

df = df.melt(id_vars='index')
df = df[df['value'].eq(1)]
df['row'] = df.groupby(['variable'])['value'].transform('cumsum')
df.pivot(index='row', columns='variable', values='index')

Output:

variable    1  2    3
row                  
1           B  C    A
2           E  D    G
3         NaN  F  NaN

Upvotes: 1

Related Questions