Jason Leaver
Jason Leaver

Reputation: 314

pandas mask indexing misshaped DataFrame

Is there a built in way with pandas to accomplish this.

I'd prefer to avoid pd.concat([...],1) .all(1) methods as the dataset I'm working with has missing data points.

main.py

import pandas as pd
import numpy as np
import numpy.typing as npt

def _index_mask(index_a: pd.Index, index_b: pd.Index) -> npt.NDArray[np.bool_]:

    return index_b.isin(index_a[index_a.isin(index_b)])


def mask_b(a: pd.DataFrame, b: pd.DataFrame) -> pd.DataFrame:
    return b[_index_mask(a.index, b.index)]


if __name__ == '__main__':

    frame_a = pd.DataFrame(
        np.arange(10).reshape(5, 2),
        index=["A", "B", "C", "D", "E"]
    )
    frame_b = pd.DataFrame(
        np.arange(16).reshape(8, 2),
        index=["F", "G", "H", "C", "D", "E",  "I", "J"]
    )
    x = mask_b(frame_a, frame_b)
    print(x)

edit

I forgot to mention that I also need to perform the inverse operation on frame_a

def _index_mask(index_a: pd.Index, index_b: pd.Index) -> tuple[npt.NDArray[np.bool_],npt.NDArray[np.bool_]]:


    return index_a.isin(index_b), index_b.isin(index_a)


mask_a, mask_b = _index_mask(frame_a.index, frame_b.index)
frame_a = frame_a[mask_a]
frame_b = frame_b[mask_b]
assert all(frame_b.index == frame_a.index)

result

    0   1
C   6   7
D   8   9
E  10  11

Upvotes: 0

Views: 41

Answers (1)

user17242583
user17242583

Reputation:

I think you need pd.Index.intersection:

x = frame_b.loc[frame_a.index.intersection(frame_b.index)]

Output:

>>> x
    0   1
C   6   7
D   8   9
E  10  11

Upvotes: 1

Related Questions