Reputation: 6061
I'm trying to create the following data frame
new_df = pd.DataFrame(data = percentage_default, columns =
df['purpose'].unique())
The variables I'm using are as follows
percentage_default = [0.15238817285822592,
0.11568938193343899,
0.16602316602316602,
0.17011128775834658,
0.2778675282714055,
0.11212814645308924,
0.20116618075801748]
df['purpose'].unique = array(['debt_consolidation', 'credit_card', 'all_other',
'home_improvement', 'small_business', 'major_purchase',
'educational'], dtype=object)
When I try to create this data frame I get the following error:
Shape of passed values is (1, 7), indices imply (7, 7)
To me it seemed like the shape of the values and idices were the same. Could someone explain what I'm missing here?
Thanks!
Upvotes: 0
Views: 984
Reputation: 413
You're creating a dataframe from a list. Calling pd.DataFrame(your_list)
where your_list is a simple homogenous list will create a single row for every element in that list. For your input:
percentage_default = [0.15238817285822592,
0.11568938193343899,
0.16602316602316602,
0.17011128775834658,
0.2778675282714055,
0.11212814645308924,
0.20116618075801748]
pandas will create a dataframe like this:
Column
0.15238817285822592
0.11568938193343899
0.16602316602316602
0.17011128775834658
0.2778675282714055
0.11212814645308924
0.20116618075801748
Because of this, your dataframe only has one column. You're trying to pass multiple column names, which is confusing pandas.
If you wish to create a dataframe from a list with multiple columns, you need to nest more lists or tuples inside your original list. Each nested tuple/list will become a row in the dataframe, and each element in the nested tuple/list will become a new column. See this:
percentage_default = [(0.15238817285822592,
0.11568938193343899,
0.16602316602316602,
0.17011128775834658,
0.2778675282714055,
0.11212814645308924,
0.20116618075801748)] # nested tuple
We have one nested tuple in this list, so our dataframe will have 1 row with n columns, where n is the number of elements in the nested tuple (7). We can then pass your 7 column names:
percentage_default = [(0.15238817285822592,
0.11568938193343899,
0.16602316602316602,
0.17011128775834658,
0.2778675282714055,
0.11212814645308924,
0.20116618075801748)]
col_names = ['debt_consolidation', 'credit_card', 'all_other',
'home_improvement', 'small_business', 'major_purchase',
'educational']
new_df = pd.DataFrame(percentage_default, columns = col_names)
print(new_df)
debt_consolidation credit_card all_other home_improvement \
0 0.152388 0.115689 0.166023 0.170111
small_business major_purchase educational
0 0.277868 0.112128 0.201166
Upvotes: 1
Reputation: 306
Try to rewrite your data in a next way:
percentage_default = {
'debt_consolidation': 0.15238817285822592,
'credit_card': 0.11568938193343899,
...
}
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html
Upvotes: 1