marupav
marupav

Reputation: 335

How to split rows in dataframe based on specific names in python?

enter image description here

I have a dataframe with below structure:

Bronze  Age              0-30   31-60   60+
Country State   Region          
A         X     R1       100    200     300
B         Y     R2       150    220     140
Silver  Age             0-30    31-60   60+
Country State   Region          
A        X      R1      150     220     167
B        Y      R2      735     398     239
Gold    Age             0-30    31-60   60+
Country State   Region          
A        X      R1      232    564     878
B        Y      R2      112    233     543
Platinum    Age         0-30    31-60   60+
Country State   Region          
A        X      R1      646   756      323
B        Y      R2      434   877      978

Now, I want to separate each of the category like bronze, silver, gold and platinum into 4 different dataframes.

Sample for bronze output is:

Country State   Region  Age   Count     
A         X     R1      0-30    100
A         X     R1      31-60   200
A         X     R1      60+     300
B         Y     R2      0-30    150
B         Y     R2      31-60   220
B         Y     R2      60+     140

I am new to python and could not get any idea on how to do this kind of row split.

Upvotes: 1

Views: 1381

Answers (2)

jezrael
jezrael

Reputation: 863481

Create new column for distiguish ages, then use mask by isin and fillna names by forward filling method:

df['ages'] = df.iloc[:,0]

ages = ['Bronze','Silver','Gold', 'Platinum']
mask = df.ages.isin(ages)

df['ages'] = df.ages[mask]
df['ages'] = df.ages.fillna(method='ffill')
#print df

#print df.iloc[1,:].fillna(df.iloc[0,:])
df.columns = df.iloc[1,:].fillna(df.iloc[0,:])

df = df.dropna().rename(columns={'Bronze':'Ages'})

print df 
1  Country State Region 0-30 31-60  60+      Ages
2        A     X     R1  100   200  300    Bronze
3        B     Y     R2  150   220  140    Bronze
6        A     X     R1  150   220  167    Silver
7        B     Y     R2  735   398  239    Silver
10       A     X     R1  232   564  878      Gold
11       B     Y     R2  112   233  543      Gold
14       A     X     R1  646   756  323  Platinum
15       B     Y     R2  434   877  978  Platinum

Use melt for reshaping, then sort DataFrame by column Country with function sort_values:

df = pd.melt(df, 
             id_vars=['Country','State','Region', 'Ages'], 
             var_name='Age', 
             value_name='Count' )

df = df.sort_values(by=['Country', 'Age'])

print df
   Country State Region      Ages    Age Count
0        A     X     R1    Bronze   0-30   100
2        A     X     R1    Silver   0-30   150
4        A     X     R1      Gold   0-30   232
6        A     X     R1  Platinum   0-30   646
8        A     X     R1    Bronze  31-60   200
10       A     X     R1    Silver  31-60   220
12       A     X     R1      Gold  31-60   564
14       A     X     R1  Platinum  31-60   756
16       A     X     R1    Bronze    60+   300
18       A     X     R1    Silver    60+   167
20       A     X     R1      Gold    60+   878
22       A     X     R1  Platinum    60+   323
1        B     Y     R2    Bronze   0-30   150
3        B     Y     R2    Silver   0-30   735
5        B     Y     R2      Gold   0-30   112
7        B     Y     R2  Platinum   0-30   434
9        B     Y     R2    Bronze  31-60   220
11       B     Y     R2    Silver  31-60   398
13       B     Y     R2      Gold  31-60   233
15       B     Y     R2  Platinum  31-60   877
17       B     Y     R2    Bronze    60+   140
19       B     Y     R2    Silver    60+   239
21       B     Y     R2      Gold    60+   543
23       B     Y     R2  Platinum    60+   978

Create dictionary d for storing DataFrames and fill it:

d = {}

for age in ages:
    #print df[df.Ages == age]
    d.update({age : df[df.Ages == age].drop('Ages', axis=1).reset_index(drop=True)})

print d['Bronze']  
  Country State Region    Age Count
0       A     X     R1   0-30   100
1       A     X     R1  31-60   200
2       A     X     R1    60+   300
3       B     Y     R2   0-30   150
4       B     Y     R2  31-60   220
5       B     Y     R2    60+   140
print d['Silver']   
  Country State Region    Age Count
0       A     X     R1   0-30   150
1       A     X     R1  31-60   220
2       A     X     R1    60+   167
3       B     Y     R2   0-30   735
4       B     Y     R2  31-60   398
5       B     Y     R2    60+   239

Upvotes: 1

iboboboru
iboboboru

Reputation: 1102

Hi something might this might work, not tested. Select based on the multi-index, reset_index and melt.

dfBronze =  df.ix["Bronze"].reset_index()
dfBronze =  dfBronze.melt(id_vars=['Country','State','Region'], value_vars=['Age'])

Upvotes: 0

Related Questions