Reputation: 240
I am trying to find an efficient way of multiplying each column combination within a pandas dataframe. I have managed to achieve this with itertools, however when the size of the dataframe increases it dramatically slows down. I am going to need to perform this on a dataframe with a size of about (100,1000)
Example of working code with smaller dataframe below,
import numpy as np
import pandas as pd
from itertools import combinations_with_replacement
df = pd.DataFrame(np.random.randn(3, 10))
new_df = pd.DataFrame()
for p in combinations_with_replacement(df.columns,2):
title = p
new_df[title] = df[p[0]]*df[p[1]]
Does anybody have any suggestions on how this could be achieved?
Upvotes: 0
Views: 1254
Reputation: 4343
Combining index view and array.prod(axis)
, this runs ~100 times faster:
def f1():
#with loop
new_df = pd.DataFrame()
for p in combinations_with_replacement(df.columns,2):
title = p
new_df[title] = df[p[0]]*df[p[1]]
return new_df
def f2():
n = len(df.columns)
ix = np.indices((n,n))[:, ~np.tri(n, k=-1, dtype=bool)]
return pd.DataFrame(df.values.T[ix.T].prod(1).T, columns=list(map(tuple, ix.T)))
Upvotes: 1