NewGuy
NewGuy

Reputation: 3423

Dictionary of lists to dataframe

I have a dictionary with each key holding a list of float values. These lists are not of same size.

I'd like to convert this dictionary to a pandas dataframe so that I can perform some analysis functions on the data easily such as (min, max, average, standard deviation, more).

My dictionary looks like this:

{
    'key1': [10, 100.1, 0.98, 1.2],
    'key2': [72.5],
    'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}

What is the best way to get this into a dataframe so that I can utilize basic functions like sum, mean, describe, std?

The examples I find (like the link above), all assume each of the keys have the same number of values in the list.

Upvotes: 54

Views: 67524

Answers (5)

thebitsdontfit
thebitsdontfit

Reputation: 143

Use

df=pd.DataFrame.from_dict(d,orient='columns')

or, as 'orient' is set to 'columns' by default, just use

df=pd.DataFrame.from_dict(d)

Upvotes: 3

aerijman
aerijman

Reputation: 2762

You can:

define the index as

idx = counts.keys()

then concatenate series

df = pd.concat([pd.Series(counts[i]) for i in idx], axis=1).T

lastly add the index

df.index=idx   

Upvotes: 2

Miriam Farber
Miriam Farber

Reputation: 19634

d={
    'key1': [10, 100.1, 0.98, 1.2],
    'key2': [72.5],
    'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}

df=pd.DataFrame.from_dict(d,orient='index').transpose()

Then df is

    key3    key2    key1
0   1.00    72.5    10.00
1   5.20    NaN     100.10
2   71.20   NaN     0.98
3   9.00    NaN     1.20
4   10.11   NaN     NaN

Note that numpy has some built in functions that can do calculations ignoring NaN values, which may be relevant here. For example, if you want to find the mean of 'key1' column, you can do it as follows:

import numpy as np
np.nanmean(df[['key1']])
28.07

Other useful functions include numpy.nanstd, numpy.nanvar, numpy.nanmedian, numpy.nansum.

EDIT: Note that the functions from your basic functions link can also handle nan values. However, their estimators may be different from those of numpy. For example, they calculate the unbiased estimator of sample variance, while the numpy version calculates the "usual" estimator of sample variance.

Upvotes: 70

piRSquared
piRSquared

Reputation: 294258

your_dict = {
    'key1': [10, 100.1, 0.98, 1.2],
    'key2': [72.5],
    'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}

pd.concat({k: pd.Series(v) for k, v in your_dict.items()})

key1  0     10.00
      1    100.10
      2      0.98
      3      1.20
key2  0     72.50
key3  0      1.00
      1      5.20
      2     71.20
      3      9.00
      4     10.11
      5     12.21
      6     65.00
      7      7.00
dtype: float64

Or with axis=1

your_dict = {
    'key1': [10, 100.1, 0.98, 1.2],
    'key2': [72.5],
    'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7]
}

pd.concat({k: pd.Series(v) for k, v in your_dict.items()}, axis=1)

     key1  key2   key3
0   10.00  72.5   1.00
1  100.10   NaN   5.20
2    0.98   NaN  71.20
3    1.20   NaN   9.00
4     NaN   NaN  10.11
5     NaN   NaN  12.21
6     NaN   NaN  65.00
7     NaN   NaN   7.00

Upvotes: 22

John Zwinck
John Zwinck

Reputation: 249153

I suggest you just create a dict of Series, since your keys do not have the same number of values:

{ key: pd.Series(val) for key, val in x.items() }

You can then do Pandas operations on each column individually.

Once you have that, if you really want a DataFrame, you can:

pd.DataFrame({ key: pd.Series(val) for key, val in x.items() })

     key1  key2   key3
0   10.00  72.5   1.00
1  100.10   NaN   5.20
2    0.98   NaN  71.20
3    1.20   NaN   9.00
4     NaN   NaN  10.11
5     NaN   NaN  12.21
6     NaN   NaN  65.00
7     NaN   NaN   7.00

Upvotes: 5

Related Questions