Christian Torres
Christian Torres

Reputation: 303

Merging DataFrames and their totals

I have 14 DataFrames.

They all have an index and 1 Column called 'Total'

Here is an example of 1 DataFrame: https://i.gyazo.com/8b31f92a469e31df89a29e4588427362.png

The index is 'Res Area' The column is 'Total'

So what I want to do is merge them all into 1 dataframe where the index will be the name of the df and the column 'Total' to compare all of these DFs.

Ive tried putting the df's in a dictionary with the key being the Name of the df and the value its Total of the top 10 added together but it doesnt work

Ive tried putting the df's in a dictionary with the key being the Name of the df and the value its Total of the top 10 added together but it doesnt work

df =  pd.DataFrame({'Res Area': resarea_df.Total[:10].sum(), 'Year Built': yearbuilt_df.Total[:10].sum(),'Retail Area': retailarea_df.Total[:10].sum()})

I get an error that says:

If using all scalar values, you must pass an index

I just want to merge all dfs into 1 df to see each dfs top 10 Totals summed together in comparison with each other that I will plot on a graph

Upvotes: 0

Views: 35

Answers (1)

ALollz
ALollz

Reputation: 59519

You are calling the wrong constructor for your DataFrame. With a dictionary of scalar values where keys become the index you want to use the .from_dict constructor:

import pandas as pd

data= {'data1': 1, 'data2': 2, 'data3': 15}
pd.DataFrame.from_dict(data, orient='index', columns=['Total'])
#       Total
#data1      1
#data2      2
#data3     15

To explain the problem you are having, when constructing a DataFrame with pd.DataFrame and a dictionary the default is to make the the keys of the DataFrame the columns. Typically the values of the passed dictionary are array-like, which allows pandas to determine how many rows to make. However with all scalar values and no index there is no way to know how many rows it needs to be.

data= {'data1': 1, 'data2': 2, 'data3': 15}
pd.DataFrame(data)
#ValueError: If using all scalar values, you must pass an index

To do this correctly, you would specify an index:

pd.DataFrame(data, index=[0])
#   data1  data2  data3
#0      1      2     15

Or make at least one value of data array-like:

data2 = {'data1': 1, 'data2': 2, 'data3': [15]}
pd.DataFrame(data2)
#   data1  data2  data3
#0      1      2     15

Upvotes: 1

Related Questions