itti_da
itti_da

Reputation: 65

Creating nested columns in python dataframe

I have 3 columns namely Models(should be taken as index), Accuracy without normalization, Accuracy with normalization (zscore, minmax, maxabs, robust) and these are required to be created as:

 ------------------------------------------------------------------------------------
|   Models  |  Accuracy without normalization    |      Accuracy with normalization  |
|           |                                    |-----------------------------------|
|           |                                    | zscore | minmax | maxabs | robust |
 ------------------------------------------------------------------------------------

dfmod-> Models column
dfacc-> Accuracy without normalization
dfacc1-> Accuracy with normalization - zscore
dfacc2-> Accuracy with normalization - minmax
dfacc3-> Accuracy with normalization - maxabs
dfacc4-> Accuracy with normalization - robust
dfout=pd.DataFrame({('Accuracy without Normalization'):{dfacc},
     ('Accuracy using Normalization','zscore'):{dfacc1},
     ('Accuracy using Normalization','minmax'):{dfacc2},
     ('Accuracy using Normalization','maxabs'):{dfacc3},
     ('Accuracy using Normalization','robust'):{dfacc4},
   },index=dfmod
)

I was trying to do something like this but i can't figure out any further

Test data:

qda    0.6333       0.6917      0.5917      0.6417     0.5833
svm    0.5333       0.6917      0.5333      0.575      0.575
lda    0.5333       0.6583      0.5333      0.5667     0.5667
lr     0.5333       0.65        0.4917      0.5667     0.5667
dt     0.5333       0.65        0.4917      0.5667     0.5667
rc     0.5083       0.6333      0.4917      0.525      0.525
nb     0.5          0.625       0.475       0.5        0.4833
rfc    0.5          0.625       0.4417      0.4917     0.4583
knn    0.3917       0.6         0.4417      0.4833     0.45
et     0.375        0.5333      0.4333      0.4667     0.45
dc     0.375        0.5333      0.4333      0.4667     0.425
qds    0.3417       0.5333      0.4         0.4583     0.3667
lgt    0.3417       0.525       0.3917      0.45       0.3583
lt     0.2333       0.45        0.3917      0.4167     0.3417

These are values for respective subcolumns in order specified in the table above

Upvotes: 6

Views: 3379

Answers (1)

Shayan
Shayan

Reputation: 6295

There's a dirty way to do this, I'll write about it till someone answers with a better idea. Here we go:

import pandas as pd

# I assume that you can read raw data named test.csv by pandas and
# set header = None cause you mentioned the Test data without any headers, so:
df = pd.read_csv("test.csv", header = None)

# Then define preferred Columns! 
MyColumns = pd.MultiIndex.from_tuples([("Models" , ""),
                                       ("Accuracy without normalization" , ""),
                                       ("Accuracy with normalization" , "zscore"),
                                       ("Accuracy with normalization" , "minmax"),
                                       ("Accuracy with normalization" , "maxabs"),
                                       ("Accuracy with normalization" , "robust")])

# Create new DataFrame with specified Columns, after this you should pass values 
New_DataFrame = pd.DataFrame(df , columns = MyColumns)

# a loop for passing values
for item in range(len(MyColumns)):
    New_DataFrame.loc[: , MyColumns[item]] = df.iloc[: , item]

This gives me:

enter image description here

after all, if you want to set Models as the index of New_DataFrame, You can continue with:

New_DataFrame.set_index(New_DataFrame.columns[0][0] , inplace=True)
New_DataFrame

This gives me:

enter image description here

Upvotes: 3

Related Questions