M.M
M.M

Reputation: 1373

Iterating pandas dataframe row pairwise

Is there a faster way to iterate Pandas data frame row pairwise to do some calculations? My code below is not fast enough. I wonder if there is Pandas workaround this.

I started with iterrows, then found itertuples faster, but still not fast enough.


def pairwisecalculate(df):
    sim = []
    for row_1 in df.itertuples():
      for row_2 in df.itertuples():
        sum = 0.
        for i, c in enumerate(df.columns):
            if row_1[i] == row_2[i]:
                sum+=1
        sim.append(sum/ (len(df.columns)-1))
    return sim

Upvotes: 0

Views: 358

Answers (2)

otluk
otluk

Reputation: 317

You can also try to use https://www.pola.rs/ (-> https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.Series.rolling_var.html)

Source: https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.Series.rolling_apply.html#polars.Series.rolling_apply

s = pl.Series("A", [1.0, 2.0, 9.0, 2.0, 13.0])
s.rolling_apply(function=lambda s: s.std(), window_size=3)
shape: (5,)
Series: 'A' [f64]
[
    null
    null
    4.358898943540674
    4.041451884327381
    5.5677643628300215
]

or other https://arrow.apache.org/docs/python/pandas.html Apache Arrow implantations. If you are aiming for speed.

Upvotes: 2

Corralien
Corralien

Reputation: 120469

You can try:

df.rolling(2).sum() / (len(df.columns) - 1)

Upvotes: 2

Related Questions