bugsyb
bugsyb

Reputation: 6061

Can't Create pandas DataFrame in python (Wrong Shape)

I'm trying to create the following data frame

new_df = pd.DataFrame(data = percentage_default, columns = 
df['purpose'].unique())

The variables I'm using are as follows

percentage_default = [0.15238817285822592,
0.11568938193343899,
0.16602316602316602,
0.17011128775834658,
0.2778675282714055,
0.11212814645308924,
0.20116618075801748]

df['purpose'].unique = array(['debt_consolidation', 'credit_card', 'all_other',
   'home_improvement', 'small_business', 'major_purchase',
   'educational'], dtype=object)

When I try to create this data frame I get the following error:

Shape of passed values is (1, 7), indices imply (7, 7)

To me it seemed like the shape of the values and idices were the same. Could someone explain what I'm missing here?

Thanks!

Upvotes: 0

Views: 984

Answers (2)

jsxgd
jsxgd

Reputation: 413

You're creating a dataframe from a list. Calling pd.DataFrame(your_list) where your_list is a simple homogenous list will create a single row for every element in that list. For your input:

percentage_default = [0.15238817285822592,
                      0.11568938193343899,
                      0.16602316602316602,
                      0.17011128775834658,
                      0.2778675282714055,
                      0.11212814645308924,
                      0.20116618075801748]

pandas will create a dataframe like this:

Column
0.15238817285822592
0.11568938193343899
0.16602316602316602
0.17011128775834658
0.2778675282714055
0.11212814645308924
0.20116618075801748

Because of this, your dataframe only has one column. You're trying to pass multiple column names, which is confusing pandas.

If you wish to create a dataframe from a list with multiple columns, you need to nest more lists or tuples inside your original list. Each nested tuple/list will become a row in the dataframe, and each element in the nested tuple/list will become a new column. See this:

percentage_default = [(0.15238817285822592,
                       0.11568938193343899,
                       0.16602316602316602,
                       0.17011128775834658,
                       0.2778675282714055,
                       0.11212814645308924,
                       0.20116618075801748)] # nested tuple

We have one nested tuple in this list, so our dataframe will have 1 row with n columns, where n is the number of elements in the nested tuple (7). We can then pass your 7 column names:

percentage_default = [(0.15238817285822592,
                       0.11568938193343899,
                       0.16602316602316602,
                       0.17011128775834658,
                       0.2778675282714055,
                       0.11212814645308924,
                       0.20116618075801748)]

col_names = ['debt_consolidation', 'credit_card', 'all_other',
             'home_improvement', 'small_business', 'major_purchase',
             'educational']

new_df = pd.DataFrame(percentage_default, columns = col_names)
print(new_df)


    debt_consolidation  credit_card  all_other  home_improvement  \
0            0.152388     0.115689   0.166023          0.170111   

   small_business  major_purchase  educational  
0        0.277868        0.112128     0.201166 

Upvotes: 1

Fedir Alifirenko
Fedir Alifirenko

Reputation: 306

Try to rewrite your data in a next way:

percentage_default = {
    'debt_consolidation': 0.15238817285822592,
    'credit_card': 0.11568938193343899,
    ...
}

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html

Upvotes: 1

Related Questions