Nilani Algiriyage
Nilani Algiriyage

Reputation: 35776

Constructing DataFrame from values in variables yields "ValueError: If using all scalar values, you must pass an index"

I have two variables as follows.

a = 2
b = 3

I want to construct a DataFrame from this:

df2 = pd.DataFrame({'A':a, 'B':b})

This generates an error:

ValueError: If using all scalar values, you must pass an index

I tried this also:

df2 = (pd.DataFrame({'a':a, 'b':b})).reset_index()

This gives the same error message. How do I do what I want?

Upvotes: 864

Views: 1538136

Answers (24)

DSM
DSM

Reputation: 353499

The error message says that if you're passing scalar values, you have to pass an index. So you can either not use scalar values for the columns -- e.g. use a list:

>>> df = pd.DataFrame({'A': [a], 'B': [b]})
>>> df
   A  B
0  2  3

or use scalar values and pass an index:

>>> df = pd.DataFrame({'A': a, 'B': b}, index=[0, 3])
>>> df
   A  B
0  2  3
3  2  3

Upvotes: 1249

firelynx
firelynx

Reputation: 32234

Pandas magic at work. All logic is out.

The error message "ValueError: If using all scalar values, you must pass an index" Says you must pass an index.

This does not necessarily mean passing an index makes pandas do what you want it to do

When you pass an index, pandas will treat your dictionary keys as column names and the values as what the column should contain for each of the values in the index.

a = 2
b = 3
df2 = pd.DataFrame({'A':a,'B':b}, index=[1])

    A   B
1   2   3

Passing a larger index:

df2 = pd.DataFrame({'A':a,'B':b}, index=[1, 2, 3, 4])

    A   B
1   2   3
2   2   3
3   2   3
4   2   3

An index is usually automatically generated by a dataframe when none is given. However, pandas does not know how many rows of 2 and 3 you want. You can however be more explicit about it

df2 = pd.DataFrame({'A':[a]*4,'B':[b]*4})
df2

    A   B
0   2   3
1   2   3
2   2   3
3   2   3

The default index is 0 based though.

I would recommend always passing a dictionary of lists to the dataframe constructor when creating dataframes. It's easier to read for other developers. Pandas has a lot of caveats, don't require other developers to have to be experts in all of them in order to read your code.

Upvotes: 19

S.V
S.V

Reputation: 2793

The input does not have to be a list of records - it can be a single dictionary as well:

pd.DataFrame.from_records({'a':1,'b':2}, index=[0])
   a  b
0  1  2

Which seems to be equivalent to:

pd.DataFrame({'a':1,'b':2}, index=[0])
   a  b
0  1  2

Upvotes: 7

cottontail
cottontail

Reputation: 23449

If the data is a dictionary, one way to construct a dataframe is to call pd.json_normalize() which constructs a flat dataframe (the index is created under the covers). Its main use case is to flatten a nested dictionary but works on a flat dictionary as well.

df = pd.json_normalize({'A': 2, 'B': 3})


   A  B
0  2  3

In general, it's possible to construct a dataframe by first initializing an empty dataframe and then filling it by the dictionary.

  • For a wide dataframe:

    d = {'A': 2, 'B': 3}
    df = pd.DataFrame(columns=d.keys())
    df.loc[0] = d
    
    
       A  B
    0  2  3
    
  • For a long dataframe:

    df = pd.DataFrame()
    df['col'] = {'A': 2, 'B': 3}
    
    
       col
    A    2
    B    3
    

If the data is some scalar values (as in the OP), then wrap it in a list/tuple and pass to the dataframe constructor (and optionally pass column/index labels). A nested list constructs a wide dataframe and a flat list constructs a long dataframe.

a = 2
b = 3

df1 = pd.DataFrame([[a, b]], columns=['A', 'B'])


   A  B
0  2  3



df2 = pd.DataFrame([a, b], columns=['A'])


   A
0  2
1  3

Upvotes: 7

You could try this:

df2 = pd.DataFrame.from_dict({'a':a,'b':b}, orient = 'index')

Upvotes: 5

CN_Cabbage
CN_Cabbage

Reputation: 445

To figure out the "ValueError" understand DataFrame and "scalar values" is needed.
To create a Dataframe from dict, at least one Array is needed.

IMO, array itself is indexed.
Therefore, if there is an array-like value there is no need to specify index.
e.g. The index of each element in ['a', 's', 'd', 'f'] are 0,1,2,3 separately.

df_array_like = pd.DataFrame({
    'col' : 10086,
    'col_2' : True,
    'col_3' : "'at least one array'",
    'col_4' : ['one array is arbitrary length', 'multi arrays should be the same length']}) 
print("df_array_like: \n", df_array_like)

Output:

df_array_like: 
      col  col_2                 col_3                                   col_4
0  10086   True  'at least one array'           one array is arbitrary length
1  10086   True  'at least one array'  multi arrays should be the same length

As shows in the output, the index of the DataFrame is 0 and 1.
Coincidently same with the index of the array ['one array is arbitrary length', 'multi arrays should be the same length']

If comment out the 'col_4', it will raise

ValueError("If using all scalar values, you must pass an index")

Cause scalar value (integer, bool, and string) does not have index
Note that Index(...) must be called with a collection of some kind
Since index used to locate all the rows of DataFrame
index should be an array. e.g.

df_scalar_value = pd.DataFrame({
'col' : 10086,
'col_2' : True,
'col_3' : "'at least one array'"
}, index = ['fst_row','snd_row','third_row']) 
print("df_scalar_value: \n", df_scalar_value)

Output:

df_scalar_value: 
              col  col_2                 col_3
fst_row    10086   True  'at least one array'
snd_row    10086   True  'at least one array'
third_row  10086   True  'at least one array'

I'm a beginner, I'm learning python and English. 👀

Upvotes: 10

M. John
M. John

Reputation: 99

import pandas as pd
 a=2
 b=3
dict = {'A': a, 'B': b}

pd.DataFrame(pd.Series(dict)).T  
# *T :transforms the dataframe*

   Result:
    A   B
0   2   3

Upvotes: 9

NewBie
NewBie

Reputation: 3584

You may try wrapping your dictionary into a list:

my_dict = {'A':1,'B':2}
pd.DataFrame([my_dict])
   A  B
0  1  2

Upvotes: 264

Hank Gordon
Hank Gordon

Reputation: 137

I tried transpose() and it worked. Downside: You create a new object.

testdict1 = {'key1':'val1','key2':'val2','key3':'val3','key4':'val4'}

df = pd.DataFrame.from_dict(data=testdict1,orient='index')
print(df)
print(f'ID for DataFrame before Transpose: {id(df)}\n')

df = df.transpose()
print(df)
print(f'ID for DataFrame after Transpose: {id(df)}')

Output

         0
key1  val1
key2  val2
key3  val3
key4  val4
ID for DataFrame before Transpose: 1932797100424

   key1  key2  key3  key4
0  val1  val2  val3  val4
ID for DataFrame after Transpose: 1932797125448

​```

Upvotes: 5

chenchuk
chenchuk

Reputation: 5742

Another option is to convert the scalars into list on the fly using Dictionary Comprehension:

df = pd.DataFrame(data={k: [v] for k, v in mydict.items()})

The expression {...} creates a new dict whose values is a list of 1 element. such as :

In [20]: mydict
Out[20]: {'a': 1, 'b': 2}

In [21]: mydict2 = { k: [v] for k, v in mydict.items()}

In [22]: mydict2
Out[22]: {'a': [1], 'b': [2]}

Upvotes: 2

DataYoda
DataYoda

Reputation: 825

simplest options ls :

dict  = {'A':a,'B':b}
df = pd.DataFrame(dict, index = np.arange(1) )

Upvotes: 1

Moritz Molch
Moritz Molch

Reputation: 193

I usually use the following to to quickly create a small table from dicts.

Let's say you have a dict where the keys are filenames and the values their corresponding filesizes, you could use the following code to put it into a DataFrame (notice the .items() call on the dict):

files = {'A.txt':12, 'B.txt':34, 'C.txt':56, 'D.txt':78}
filesFrame = pd.DataFrame(files.items(), columns=['filename','size'])
print(filesFrame)

  filename  size
0    A.txt    12
1    B.txt    34
2    C.txt    56
3    D.txt    78

Upvotes: 15

Kalpana
Kalpana

Reputation: 123

Change your 'a' and 'b' values to a list, as follows:

a = [2]
b = [3]

then execute the same code as follows:

df2 = pd.DataFrame({'A':a,'B':b})
df2

and you'll get:

    A   B
0   2   3

Upvotes: 2

kamran kausar
kamran kausar

Reputation: 4603

Convert Dictionary to Data Frame

col_dict_df = pd.Series(col_dict).to_frame('new_col').reset_index()

Give new name to Column

col_dict_df.columns = ['col1', 'col2']

Upvotes: 1

Elrond
Elrond

Reputation: 2112

You need to create a pandas series first. The second step is to convert the pandas series to pandas dataframe.

import pandas as pd
data = {'a': 1, 'b': 2}
pd.Series(data).to_frame()

You can even provide a column name.

pd.Series(data).to_frame('ColumnName')

Upvotes: 87

LeandroHumb
LeandroHumb

Reputation: 873

Just pass the dict on a list:

a = 2
b = 3
df2 = pd.DataFrame([{'A':a,'B':b}])

Upvotes: -3

MicheleDIncecco
MicheleDIncecco

Reputation: 109

I had the same problem with numpy arrays and the solution is to flatten them:

data = {
    'b': array1.flatten(),
    'a': array2.flatten(),
}

df = pd.DataFrame(data)

Upvotes: 10

Matthew Connell
Matthew Connell

Reputation: 137

You could try:

df2 = pd.DataFrame.from_dict({'a':a,'b':b}, orient = 'index')

From the documentation on the 'orient' argument: If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.

Upvotes: 12

k0L1081
k0L1081

Reputation: 157

If you intend to convert a dictionary of scalars, you have to include an index:

import pandas as pd

alphabets = {'A': 'a', 'B': 'b'}
index = [0]
alphabets_df = pd.DataFrame(alphabets, index=index)
print(alphabets_df)

Although index is not required for a dictionary of lists, the same idea can be expanded to a dictionary of lists:

planets = {'planet': ['earth', 'mars', 'jupiter'], 'length_of_day': ['1', '1.03', '0.414']}
index = [0, 1, 2]
planets_df = pd.DataFrame(planets, index=index)
print(planets_df)

Of course, for the dictionary of lists, you can build the dataframe without an index:

planets_df = pd.DataFrame(planets)
print(planets_df)

Upvotes: 3

danuker
danuker

Reputation: 881

This is because a DataFrame has two intuitive dimensions - the columns and the rows.

You are only specifying the columns using the dictionary keys.

If you only want to specify one dimensional data, use a Series!

Upvotes: 3

Rob
Rob

Reputation: 392

Maybe Series would provide all the functions you need:

pd.Series({'A':a,'B':b})

DataFrame can be thought of as a collection of Series hence you can :

  • Concatenate multiple Series into one data frame (as described here )

  • Add a Series variable into existing data frame ( example here )

Upvotes: 21

ingrid
ingrid

Reputation: 555

If you have a dictionary you can turn it into a pandas data frame with the following line of code:

pd.DataFrame({"key": d.keys(), "value": d.values()})

Upvotes: -2

fAX
fAX

Reputation: 1481

You can also use pd.DataFrame.from_records which is more convenient when you already have the dictionary in hand:

df = pd.DataFrame.from_records([{ 'A':a,'B':b }])

You can also set index, if you want, by:

df = pd.DataFrame.from_records([{ 'A':a,'B':b }], index='A')

Upvotes: 113

ely
ely

Reputation: 77494

You need to provide iterables as the values for the Pandas DataFrame columns:

df2 = pd.DataFrame({'A':[a],'B':[b]})

Upvotes: 11

Related Questions