rlee300
rlee300

Reputation: 69

Grouping Multiple Columns into a Few Columns-Pandas Dataframes

I have a dataframe that has dummy variables; however, I want to group multiple columns(100+) into a few amount of columns. Any suggestion(s) would be much appreciated. Thanks! For example:

   A_1 | A_2 | A_3| B_1|  B_2|  B_3|  C_1|  C_2|  C_2|
0 |   0|    1|   0|   0|    0|    0|    0|    0|    0|
1 |   0|    0|   0|   0|    0|    1|    0|    1|    0|
2 |   0|    0|   0|   0|    0|    0|    1|    0|    0|
3 |   0|    0|   0|   0|    1|    0|    0|    0|    0|
4 |   1|    0|   0|   0|    0|    0|    0|    0|    0|
5 |   0|    0|   1|   0|    0|    0|    0|    1|    0|
6 |   0|    0|   0|   1|    0|    0|    0|    0|    0|

Desired Output:

   A|  B|  C|  
0| 1|  0|  0|      
1| 0|  1|  1|       
2| 0|  0|  1|       
3| 0|  1|  0|       
4| 1|  0|  0|       
5| 1|  0|  1|      
6| 0|  1|  0|     

I've tried using this code; however, I keep running into error messages saying that the column name is missing.

categories = {'A':'A','B': 'B','C': 'C'}
    def correct_categories(cols1):
        return [categories[cat] for col1 in cols1 for cat in categories.keys() if col1.startswith(cat)]
        
rslt = df3.groupby(correct_categories(df3.columns),axis=1).sum()
print(rslt)

Error Message: KeyError: 'A'

Upvotes: 1

Views: 74

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150785

Try extracting the first part with .str.split() or .str.extract, then groupby on axis=1:

# also groupby on
# df.columns.str.extract('^([^_]+)', expand=False)
df.groupby(df.columns.str.split('_').str[0], axis=1).sum()

Output:

   A  B  C
0  1  0  0
1  0  1  1
2  0  0  1
3  0  1  0
4  1  0  0
5  1  0  1
6  0  1  0

Upvotes: 1

Related Questions