Sunnie
Sunnie

Reputation: 91

Consolidating categories in columns

I have a df with a race column, which has 4 categories. However, I would like to only have three categories by combining the last two categories. This is what my current df looks like:


Year |State| Race |Sex | population
-----|-----|------|----|----------
2006 | CA  | 1    |1   | 504893
2006 | CA  | 1    |2   | 783760
2006 | CA  | 2    |1   | 800062
2006 | CA  | 2    |2   | 768300
2006 | CA  | 3    |1   | 347827
2006 | CA  | 3    |2   | 344672
2006 | CA  | 4    |1   | 565344
2006 | CA  | 4    |2   | 356779

I want to consolidate the race==3 and race ==4 into one value (which would be race ==3). So my new df output would look something like this:


Year |State| Race |Sex | population
-----|-----|------|----|----------
2006 | CA  | 1    |1   | 504893
2006 | CA  | 1    |2   | 783760
2006 | CA  | 2    |1   | 800062
2006 | CA  | 2    |2   | 768300
2006 | CA  | 3    |1   | 913171
2006 | CA  | 3    |2   | 701451

Notice, the last two rows in this df are the sum of the the last 4 rows on the last df (by sex). How can I accomplish this?

Upvotes: 0

Views: 54

Answers (2)

Vaishali
Vaishali

Reputation: 38415

Replace Race 4 by 3 and group data by Race + Sex

df.loc[df['Race']==4, 'Race']=3
df = df.groupby(['Race','Sex'],as_index=False)['population'].sum()

You get

Year State Race Sex  population    
2006  CA   1    1    504893
2006  CA   1    2    783760
2006  CA   2    1    800062
2006  CA   2    2    768300
2006  CA   3    1    913171
2006  CA   3    2    701451

Upvotes: 1

Mr. For Example
Mr. For Example

Reputation: 4313

Example code to show you how to do it, for simplify I just using the last two row of your dataset:

import pandas as pd

df = pd.DataFrame({'Race': [3, 3, 4, 4], 'Sex': [1, 2, 1, 2], 'population': [347827, 344672, 565344, 356779]})
for i in [1, 2]:
    df.loc[(df['Sex'] == i) & (df['Race'] == 3), 'population'] += df.loc[(df['Sex'] == i) & (df['Race'] == 4), 'population'].item()
df.drop(df.loc[df['Race'] == 4].index, inplace=True)

print(df)

Outputs:

   Race  Sex  population
0     3    1      913171
1     3    2      701451

Upvotes: 1

Related Questions