Reputation: 335
I have a dataframe with below structure:
Bronze Age 0-30 31-60 60+
Country State Region
A X R1 100 200 300
B Y R2 150 220 140
Silver Age 0-30 31-60 60+
Country State Region
A X R1 150 220 167
B Y R2 735 398 239
Gold Age 0-30 31-60 60+
Country State Region
A X R1 232 564 878
B Y R2 112 233 543
Platinum Age 0-30 31-60 60+
Country State Region
A X R1 646 756 323
B Y R2 434 877 978
Now, I want to separate each of the category like bronze, silver, gold and platinum into 4 different dataframes.
Sample for bronze output is:
Country State Region Age Count
A X R1 0-30 100
A X R1 31-60 200
A X R1 60+ 300
B Y R2 0-30 150
B Y R2 31-60 220
B Y R2 60+ 140
I am new to python and could not get any idea on how to do this kind of row split.
Upvotes: 1
Views: 1381
Reputation: 863481
Create new column for distiguish ages
, then use mask
by isin
and fillna
names
by forward filling method:
df['ages'] = df.iloc[:,0]
ages = ['Bronze','Silver','Gold', 'Platinum']
mask = df.ages.isin(ages)
df['ages'] = df.ages[mask]
df['ages'] = df.ages.fillna(method='ffill')
#print df
#print df.iloc[1,:].fillna(df.iloc[0,:])
df.columns = df.iloc[1,:].fillna(df.iloc[0,:])
df = df.dropna().rename(columns={'Bronze':'Ages'})
print df
1 Country State Region 0-30 31-60 60+ Ages
2 A X R1 100 200 300 Bronze
3 B Y R2 150 220 140 Bronze
6 A X R1 150 220 167 Silver
7 B Y R2 735 398 239 Silver
10 A X R1 232 564 878 Gold
11 B Y R2 112 233 543 Gold
14 A X R1 646 756 323 Platinum
15 B Y R2 434 877 978 Platinum
Use melt
for reshaping, then sort DataFrame by column Country
with function sort_values
:
df = pd.melt(df,
id_vars=['Country','State','Region', 'Ages'],
var_name='Age',
value_name='Count' )
df = df.sort_values(by=['Country', 'Age'])
print df
Country State Region Ages Age Count
0 A X R1 Bronze 0-30 100
2 A X R1 Silver 0-30 150
4 A X R1 Gold 0-30 232
6 A X R1 Platinum 0-30 646
8 A X R1 Bronze 31-60 200
10 A X R1 Silver 31-60 220
12 A X R1 Gold 31-60 564
14 A X R1 Platinum 31-60 756
16 A X R1 Bronze 60+ 300
18 A X R1 Silver 60+ 167
20 A X R1 Gold 60+ 878
22 A X R1 Platinum 60+ 323
1 B Y R2 Bronze 0-30 150
3 B Y R2 Silver 0-30 735
5 B Y R2 Gold 0-30 112
7 B Y R2 Platinum 0-30 434
9 B Y R2 Bronze 31-60 220
11 B Y R2 Silver 31-60 398
13 B Y R2 Gold 31-60 233
15 B Y R2 Platinum 31-60 877
17 B Y R2 Bronze 60+ 140
19 B Y R2 Silver 60+ 239
21 B Y R2 Gold 60+ 543
23 B Y R2 Platinum 60+ 978
Create dictionary d
for storing DataFrames
and fill it:
d = {}
for age in ages:
#print df[df.Ages == age]
d.update({age : df[df.Ages == age].drop('Ages', axis=1).reset_index(drop=True)})
print d['Bronze']
Country State Region Age Count
0 A X R1 0-30 100
1 A X R1 31-60 200
2 A X R1 60+ 300
3 B Y R2 0-30 150
4 B Y R2 31-60 220
5 B Y R2 60+ 140
print d['Silver']
Country State Region Age Count
0 A X R1 0-30 150
1 A X R1 31-60 220
2 A X R1 60+ 167
3 B Y R2 0-30 735
4 B Y R2 31-60 398
5 B Y R2 60+ 239
Upvotes: 1
Reputation: 1102
Hi something might this might work, not tested. Select based on the multi-index, reset_index and melt.
dfBronze = df.ix["Bronze"].reset_index()
dfBronze = dfBronze.melt(id_vars=['Country','State','Region'], value_vars=['Age'])
Upvotes: 0