Reputation: 465
Okay so I went through some long way of getting my dataframe to look like this so that I was able to graph it:
data_mth GROUP
201504 499 and below 0.001806
201505 499 and below 0.007375
201506 499 and below 0.000509
201507 499 and below 0.007344
201504 500 - 599 0.016672
201505 500 - 599 0.011473
201506 500 - 599 0.017733
201507 500 - 599 0.017651
201504 800 - 899 0.472784
201505 800 - 899 0.516837
201506 800 - 899 0.169811
201507 800 - 899 0.293966
201504 900 and above 0.065144
201505 900 and above 0.226626
201506 900 and above 0.251585
201507 900 and above 0.299850
Because of how much space this way was taking up I had to modify my code, and I have this dataframe now:
ptnr_cur_vntg_scor_band 499 and below 500 - 599 800 - 899 900 and above
data_mth
201504 0.001806 0.016672 0.472784 0.065144
201505 0.007375 0.011473 0.516837 0.226626
201506 0.000509 0.017733 0.169811 0.251585
201507 0.007344 0.017651 0.293966 0.299850
What is a good way to manipulate the second data frame to look like the first one?
My current code looks like so:
df = self.bunch['occ_data.all_data']
df = cpr.filter(df, 'ccm_acct_status', 'Open', 'Open-Inactive', 'Open-Active', 'OpenFraud', 'New')
df = df.groupby(['ptnr_cur_vntg_scor_band', 'data_mth']).sum()['ccm_curr_vntg_cnt']
df = df.unstack(0).fillna(0)
df.loc[:,"499andbelow":"NoVantageScore"] = df.loc[:, "499andbelow":"NoVantageScore"].div(df.sum(axis=1), axis=0)
df = df.fillna(0)
Its output is the second dataframe above.
Upvotes: 2
Views: 115
Reputation: 13913
import io
import pandas as pd
data = io.StringIO('''\
499 and below,500 - 599,800 - 899,900 and above
201504,0.001806,0.016672,0.472784,0.065144
201505,0.007375,0.011473,0.516837,0.226626
201506,0.000509,0.017733,0.169811,0.251585
201507,0.007344,0.017651,0.293966,0.299850
''')
df = pd.read_csv(data)
df.index.name = 'data_mth'
df.columns.name = 'ptnr_cur_vntg_scor_band'
print(df)
# ptnr_cur_vntg_scor_band 499 and below 500 - 599 800 - 899 900 and above
# data_mth
# 201504 0.001806 0.016672 0.472784 0.065144
# 201505 0.007375 0.011473 0.516837 0.226626
# 201506 0.000509 0.017733 0.169811 0.251585
# 201507 0.007344 0.017651 0.293966 0.299850
s = df.unstack().swaplevel()
s.index.names = 'data_mth', 'GROUP'
print(s)
Output:
data_mth GROUP
201504 499 and below 0.001806
201505 499 and below 0.007375
201506 499 and below 0.000509
201507 499 and below 0.007344
201504 500 - 599 0.016672
201505 500 - 599 0.011473
201506 500 - 599 0.017733
201507 500 - 599 0.017651
201504 800 - 899 0.472784
201505 800 - 899 0.516837
201506 800 - 899 0.169811
201507 800 - 899 0.293966
201504 900 and above 0.065144
201505 900 and above 0.226626
201506 900 and above 0.251585
201507 900 and above 0.299850
dtype: float64
Upvotes: 4