Reputation: 75
The array of data will come from other source, so in this example I have declared them as array and with minimal data. But the combinations of the entries in lists will be a lot more over million combinations. It's like the length of ar1 * arr2 * arr3 etc.
arr1 = [1, 2]
arr2 = [10, 20]
arr3 = [0.1, 0.2]
df1 = pd.DataFrame(arr1, columns=["col1"])
df2 = pd.DataFrame(arr2, columns=["col2"])
df3 = pd.DataFrame(arr3, columns=["col3"])
Final result of the new DataFrame should be all of posible combinations of the given arrays:
col1 col2 col3
1 10 0.1
1 10 0.2
1 20 0.1
1 20 0.2
2 10 0.1
2 10 0.2
2 20 0.1
2 20 0.2
Upvotes: 2
Views: 493
Reputation: 2122
import pandas as pd
from sklearn.model_selection import ParameterGrid
arr1 = [1, 2]
arr2 = [10, 20]
arr3 = [0.1, 0.2]
# Create a dictionary for the parameter grid
param_grid = {
'col1': arr1,
'col2': arr2,
'col3': arr3
}
# Generate all combinations of the given arrays using ParameterGrid
combinations = list(ParameterGrid(param_grid))
# Convert the combinations to a DataFrame
df_combinations = pd.DataFrame(combinations)
print(df_combinations)
col1 col2 col3
0 1 10 0.1
1 1 10 0.2
2 1 20 0.1
3 1 20 0.2
4 2 10 0.1
5 2 10 0.2
6 2 20 0.1
7 2 20 0.2
Upvotes: 0
Reputation: 28644
expand_grid from pyjanitor is a fast implementation of cartesian product and uses np.meshgrid
under the hood.
# pip install pyjanitor
import pandas as pd
import janitor as jn
# expand_grid requires a dictionary:
others = {"df1": df1, "df2": df2, "df3": df3}
jn.expand_grid(others = others).droplevel(1,1)
col1 col2 col3
0 1 10 0.1
1 1 10 0.2
2 1 20 0.1
3 1 20 0.2
4 2 10 0.1
5 2 10 0.2
6 2 20 0.1
7 2 20 0.2
expand_grid can also be extended to cartesian product of dataframe and series, and even non pandas objects. It's end product though is a dataframe.
Upvotes: 0
Reputation: 294258
functools.reduce
and pd.merge
Kind of slow.
import pandas as pd
from functools import reduce
reduce(
pd.merge,
[d.assign(dummy=1)
for d in [df1, df2, df3]
]).drop('dummy', axis=1)
col1 col2 col3
0 1 10 0.1
1 1 10 0.2
2 1 20 0.1
3 1 20 0.2
4 2 10 0.1
5 2 10 0.2
6 2 20 0.1
7 2 20 0.2
itertools.product
and pd.DataFrame.itertuples
Definitely faster
import pandas as pd
from itertools import product
def tupify(d): return d.itertuples(index=False, name=None)
def sumtup(t): return sum(t, start=())
pd.DataFrame(
list(map(sumtup, product(*map(tupify, [df1, df2, df3])))),
columns = sum(map(list, [df1, df2, df3]), start=[])
)
col1 col2 col3
0 1 10 0.1
1 1 10 0.2
2 1 20 0.1
3 1 20 0.2
4 2 10 0.1
5 2 10 0.2
6 2 20 0.1
7 2 20 0.2
Upvotes: 2
Reputation: 31166
https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_product.html effectively does what you want. Simple to then change to a dataframe
arr1 = [1, 2]
arr2 = [10, 20]
arr3 = [0.1, 0.2]
pd.DataFrame(index=pd.MultiIndex.from_product([arr1, arr2, arr3], names=["col1","col2","col3"])).reset_index()
Upvotes: 3