Scott
Scott

Reputation: 465

Manipulating Data Frame with Pandas Python

Okay so I went through some long way of getting my dataframe to look like this so that I was able to graph it:

data_mth  GROUP
201504    499 and below    0.001806
201505    499 and below    0.007375
201506    499 and below    0.000509
201507    499 and below    0.007344
201504    500 - 599        0.016672
201505    500 - 599        0.011473
201506    500 - 599        0.017733
201507    500 - 599        0.017651
201504    800 - 899        0.472784
201505    800 - 899        0.516837
201506    800 - 899        0.169811
201507    800 - 899        0.293966
201504    900 and above    0.065144
201505    900 and above    0.226626
201506    900 and above    0.251585
201507    900 and above    0.299850

Because of how much space this way was taking up I had to modify my code, and I have this dataframe now:

ptnr_cur_vntg_scor_band  499 and below  500 - 599  800 - 899  900 and above
data_mth
201504                   0.001806       0.016672   0.472784   0.065144
201505                   0.007375       0.011473   0.516837   0.226626
201506                   0.000509       0.017733   0.169811   0.251585
201507                   0.007344       0.017651   0.293966   0.299850

What is a good way to manipulate the second data frame to look like the first one?

My current code looks like so:

df = self.bunch['occ_data.all_data']
df = cpr.filter(df, 'ccm_acct_status', 'Open', 'Open-Inactive', 'Open-Active', 'OpenFraud', 'New')
df = df.groupby(['ptnr_cur_vntg_scor_band', 'data_mth']).sum()['ccm_curr_vntg_cnt']

df = df.unstack(0).fillna(0)

df.loc[:,"499andbelow":"NoVantageScore"] = df.loc[:, "499andbelow":"NoVantageScore"].div(df.sum(axis=1), axis=0)
df = df.fillna(0)

Its output is the second dataframe above.

Upvotes: 2

Views: 115

Answers (1)

Alicia Garcia-Raboso
Alicia Garcia-Raboso

Reputation: 13913

import io
import pandas as pd

data = io.StringIO('''\
499 and below,500 - 599,800 - 899,900 and above
201504,0.001806,0.016672,0.472784,0.065144
201505,0.007375,0.011473,0.516837,0.226626
201506,0.000509,0.017733,0.169811,0.251585
201507,0.007344,0.017651,0.293966,0.299850
''')

df = pd.read_csv(data)
df.index.name = 'data_mth'
df.columns.name = 'ptnr_cur_vntg_scor_band'
print(df)

# ptnr_cur_vntg_scor_band  499 and below  500 - 599  800 - 899  900 and above
# data_mth                                                                   
# 201504                        0.001806   0.016672   0.472784       0.065144
# 201505                        0.007375   0.011473   0.516837       0.226626
# 201506                        0.000509   0.017733   0.169811       0.251585
# 201507                        0.007344   0.017651   0.293966       0.299850

s = df.unstack().swaplevel()
s.index.names = 'data_mth', 'GROUP'
print(s)

Output:

data_mth  GROUP   
201504    499 and below    0.001806
201505    499 and below    0.007375
201506    499 and below    0.000509
201507    499 and below    0.007344
201504    500 - 599        0.016672
201505    500 - 599        0.011473
201506    500 - 599        0.017733
201507    500 - 599        0.017651
201504    800 - 899        0.472784
201505    800 - 899        0.516837
201506    800 - 899        0.169811
201507    800 - 899        0.293966
201504    900 and above    0.065144
201505    900 and above    0.226626
201506    900 and above    0.251585
201507    900 and above    0.299850
dtype: float64

Upvotes: 4

Related Questions