Amelio Vazquez-Reina
Amelio Vazquez-Reina

Reputation: 96300

Creating a dataframe from the full cartesian product of a dictionary

Consider a dictionary holding iterables of different length:

{'column_1': range(10),
 'column_2': range(3),
 'column_3': ['foo']}

I would like to create a dataframe that includes the full cartesian product of these entries. That is:

column 1, column_2, column_3
       0         0     'foo'
       0         1     'foo'
       0         2     'foo'
       1         0     'foo'           
       1         1     'foo'          
       1         2     'foo'           
           ...
       9         2     'foo'           

How can I do this in Pandas? Perhaps using collections?

Upvotes: 3

Views: 2179

Answers (2)

Rodalm
Rodalm

Reputation: 5433

This is "a bit" late, but here is a full pandas solution.

First construct a MultiIndex from the cartesian product of the dictionary values, using pandas.MultiIndex.from_product. The dictionary keys are used to name the index levels. Then convert each index level to a DataFrame column using the pandas.MultiIndex.to_frame

import pandas as pd

d = {
    'column_1': range(10), 
    'column_2': range(3), 
    'column_3': ['foo']
}

df = pd.MultiIndex.from_product(d.values(), names=d.keys()).to_frame(index=False)

Output

>>> df

    column_1  column_2 column_3
0          0         0      foo
1          0         1      foo
2          0         2      foo
3          1         0      foo
4          1         1      foo
5          1         2      foo
6          2         0      foo
7          2         1      foo
8          2         2      foo
9          3         0      foo
10         3         1      foo
11         3         2      foo
12         4         0      foo
13         4         1      foo
14         4         2      foo
15         5         0      foo
16         5         1      foo
17         5         2      foo
18         6         0      foo
19         6         1      foo
20         6         2      foo
21         7         0      foo
22         7         1      foo
23         7         2      foo
24         8         0      foo
25         8         1      foo
26         8         2      foo
27         9         0      foo
28         9         1      foo
29         9         2      foo

Upvotes: 1

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

Not overly familiar with pandas but this may work:

d={'column_1': range(10),
'column_2': range(3),
 'column_3': ['foo']}

import pandas as pd

from collections import OrderedDict
from itertools import product

od = OrderedDict(sorted(d.items()))
cart = list(product(*od.values()))

df = pd.DataFrame(cart,columns=od.keys())
print(df)


       column_1  column_2 column_3
0          0         0      foo
1          0         1      foo
2          0         2      foo
3          1         0      foo
4          1         1      foo
5          1         2      foo
6          2         0      foo
7          2         1      foo
8          2         2      foo
9          3         0      foo
10         3         1      foo
11         3         2      foo
12         4         0      foo
13         4         1      foo
14         4         2      foo
15         5         0      foo
16         5         1      foo
17         5         2      foo
18         6         0      foo
19         6         1      foo
20         6         2      foo
21         7         0      foo
22         7         1      foo
23         7         2      foo
24         8         0      foo
25         8         1      foo
26         8         2      foo
27         9         0      foo
28         9         1      foo
29         9         2      foo

Upvotes: 3

Related Questions