lhk
lhk

Reputation: 30056

Pandas: append row to DataFrame with multiindex in columns

I have a DataFrame with a multiindex in the columns and would like to use dictionaries to append new rows.

Let's say that each row in the DataFrame is a city. The columns contains "distance" and "vehicle". And each cell would be the percentage of the population that chooses this vehicle for this distance.

I'm constructing an index like this:

index_tuples=[]

for distance in ["near", "far"]:
    for vehicle in ["bike", "car"]:
        index_tuples.append([distance, vehicle])

index = pd.MultiIndex.from_tuples(index_tuples, names=["distance", "vehicle"])

Then I'm creating a dataframe:

dataframe = pd.DataFrame(index=["city"], columns = index)

The structure of the dataframe looks good. Although pandas has added Nans as default values ?

layout of the dataframe

Now I would like to set up a dictionary for the new city and add it:

my_home_city = {"near":{"bike":1, "car":0},"far":{"bike":0, "car":1}}
dataframe["my_home_city"] = my_home_city

But this fails:

ValueError: Length of values does not match length of index

Here is the complete error message (pastebin)

UPDATE:

Thank you for all the good answers. I'm afraid I've oversimplified the problem in my example. Actually my index is nested with 3 levels (and it could become more).

So I've accepted the universal answer of converting my dictionary into a list of tuples. This might not be as clean as the other approaches but works for any multiindex setup.

Upvotes: 8

Views: 11813

Answers (6)

Arnav Das
Arnav Das

Reputation: 301

Just to add to all of the answers, this is just another(maybe not too different) simple example, represented in a more reproducible way :

import itertools as it
from IPython.display import display # this is just for displaying output purpose
import numpy as np
import pandas as pd

col_1, col_2 = ['A', 'B'], ['C', 'D']
arr_size = len(col_2)
col = pd.MultiIndex.from_product([col_1, col_2])
tmp_df = pd.DataFrame(columns=col)
display(tmp_df)

for s in range(3):# no of rows to add to tmp_df
    tmp_dict = {x : [np.random.random_sample(1)[0] for i in range(arr_size)] for x in range(arr_size)}
    tmp_ser = pd.Series(it.chain.from_iterable([tmp_dict[x] for x in tmp_dict]), index=col)
    #  display(tmp_dict, tmp_ser)
    tmp_df = tmp_df.append(tmp_ser[tmp_df.columns], ignore_index=True)

display(tmp_df)

Some things to note about above:

  • The number of items to add should always match len(col_1)*len(col_2), that is the product of element lengths your multi-index is made from.
  • list(it.chain.from_iterable([[2, 3], [4, 5]])) simply does this [2,3,4,5]

Upvotes: 1

Yanni Papadakis
Yanni Papadakis

Reputation: 135

try this workaround

  • append to dict
  • then convert to pandas data frame
  • at the very last step select desired columns to create multi-index with set_index()
d = dict()
for g in predictor_types:
    for col in predictor_types[g]:
        tot = len(ames) - ames[col].count()
        if tot:
            d.setdefault('type',[]).append(g)
            d.setdefault('predictor',[]).append(col)
            d.setdefault('missing',[]).append(tot)
pd.DataFrame(d).set_index(['type','predictor']).style.bar(color='DodgerBlue')

Upvotes: 0

BENY
BENY

Reputation: 323226

Multi index is a list of tuple , we just need to modify your dict ,then we could directly assign the value

d = {(x,y):my_home_city[x][y] for x in my_home_city for y in my_home_city[x]}
df.loc['my_home_city',:]=d
df
Out[994]: 
distance     near       far     
vehicle      bike  car bike  car
city          NaN  NaN  NaN  NaN
my_home_city    1    0    0    1

More Info

d
Out[995]: 
{('far', 'bike'): 0,
 ('far', 'car'): 1,
 ('near', 'bike'): 1,
 ('near', 'car'): 0}

df.columns.values
Out[996]: array([('near', 'bike'), ('near', 'car'), ('far', 'bike'), ('far', 'car')], dtype=object)

Upvotes: 4

Alexander
Alexander

Reputation: 109546

Initialize your empty dataframe using MultiIndex.from_product.

distances = ['near', 'far']
vehicles = ['bike', 'car']
df = pd.DataFrame([], columns=pd.MultiIndex.from_product([distances, vehicles]), 
                  index=pd.Index([], name='city'))

Your dictionary results in a square matrix (distance by vehicle), so unstack it (which will result in a Series), then convert it into a dataframe row by calling (to_frame) using the relevant city name and transposing the column into a row.

>>> df.append(pd.DataFrame(my_home_city).unstack().to_frame('my_home_city').T)
              far     near    
             bike car bike car
city                          
my_home_city    0   1    1   0

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

You can append to you dataframe like this:

my_home_city = {"near":{"bike":1, "car":0},"far":{"bike":0, "car":1}}
dataframe.append(pd.DataFrame.from_dict(my_home_city).unstack().rename('my_home_city'))

Output:

distance     near       far     
vehicle      bike  car bike  car
city          NaN  NaN  NaN  NaN
my_home_city    1    0    0    1

The trick is to create the dataframe row with from_dict then unstack to get structure of your original dataframe with multiindex columns then rename to get index and append.

Or if you don't want to create the empty dataframe first you can use this method to create the dataframe with the new data.

pd.DataFrame.from_dict(my_home_city).unstack().rename('my_home_city').to_frame().T

Output:

              far     near    
             bike car bike car
my_home_city    0   1    1   0

Explained:

pd.DataFrame.from_dict(my_home_city)

      far  near
bike    0     1
car     1     0

Now, let's unstack to create multiindex and get to that new dataframe into the structure of the original dataframe.

pd.DataFrame.from_dict(my_home_city).unstack()

far   bike    0
      car     1
near  bike    1
      car     0
dtype: int64

We use rename to give that series a name which becomes the index label of that dataframe row when appended to the original dataframe.

far   bike    0
      car     1
near  bike    1
      car     0
Name: my_home_city, dtype: int64

Now if you converted that series to a frame and transposed it would look very much like a new row, however, there is no need to do this because, Pandas does intrinsic data alignment, so appending this series to the dataframe will auto-align and add the new dataframe record.

dataframe.append(pd.DataFrame.from_dict(my_home_city).unstack().rename('my_home_city'))
distance     near       far     
vehicle      bike  car bike  car
city          NaN  NaN  NaN  NaN
my_home_city    1    0    0    1

Upvotes: 4

cs95
cs95

Reputation: 402483

I don't think you even need to initialise an empty dataframe. With your d, I can get your desired output with unstack and a transpose:

pd.DataFrame(d).unstack().to_frame().T

   far     near    
  bike car bike car
0    0   1    1   0

Upvotes: 2

Related Questions