Why is getting the reverse of an index in pandas so slow?

Question

I have a pandas dataframe that I'm using to store network data; it looks like:

from_id, to_id, count
X, Y, 3
Z, Y, 4
Y, X, 2
...

I am trying to add a new column, inverse_count, which gets the count value for the row where the from_id and to_id are reversed from the current row.

I'm taking the following approach. I thought that it would be fast but it is much slower than I anticipated, and I can't figure out why.

def get_inverse_val(x):
    # Takes the inverse of the index for a given row
    # When passed to apply with axis = 1, the index becomes the name
    try:
        return df.loc[(x.name[1], x.name[0]), 'count']
    except KeyError:
        return 0

df = df.set_index(['from_id', 'to_id'])

df['inverse_count'] = df.apply(get_inverse_val, axis = 1)

Ben · Accepted Answer

Why not do a simple merge for this?

df = pd.DataFrame({'from_id': ['X', 'Z', 'Y'], 'to_id': ['Y', 'Y', 'X'], 'count': [3,4,2]})

pd.merge(
  left = df, 
  right = df, 
  how = 'left', 
  left_on = ['from_id', 'to_id'], 
  right_on = ['to_id', 'from_id']
)

  from_id_x to_id_x  count_x from_id_y to_id_y  count_y
0         X       Y        3         Y       X      2.0
1         Z       Y        4       NaN     NaN      NaN
2         Y       X        2         X       Y      3.0

Here we merge from (from, to) -> (to, from) to get reversed matching pairs. In general, you should avoid using apply() as it's slow. (To understand why, realized that it is not a vectorized operation.)

Why is getting the reverse of an index in pandas so slow?

Answers (2)

Related Questions