Reputation: 1373
Is there a faster way to iterate Pandas data frame row pairwise to do some calculations? My code below is not fast enough. I wonder if there is Pandas workaround this.
I started with iterrows
, then found itertuples
faster, but still not fast enough.
def pairwisecalculate(df):
sim = []
for row_1 in df.itertuples():
for row_2 in df.itertuples():
sum = 0.
for i, c in enumerate(df.columns):
if row_1[i] == row_2[i]:
sum+=1
sim.append(sum/ (len(df.columns)-1))
return sim
Upvotes: 0
Views: 358
Reputation: 317
You can also try to use https://www.pola.rs/ (-> https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.Series.rolling_var.html)
s = pl.Series("A", [1.0, 2.0, 9.0, 2.0, 13.0])
s.rolling_apply(function=lambda s: s.std(), window_size=3)
shape: (5,)
Series: 'A' [f64]
[
null
null
4.358898943540674
4.041451884327381
5.5677643628300215
]
or other https://arrow.apache.org/docs/python/pandas.html Apache Arrow implantations. If you are aiming for speed.
Upvotes: 2