Pandas: Remove Rows Where the Value of One Column Appears on Any Row in Another

Question

Example data is:

000000008,2, 1,000000010
000000009,1, 1,000000011
000000010,1, 1,000000008
000000011,2, 1,000000032
000000012,3, 1,000000009
000000013,2, 1,000000108

You can see that some values in the first column also appear in the fourth column. I want to remove those rows, where the value in the fourth column also appears on any row in the first column.

Therefore, in this example, following rows should be removed:

000000008,2, 1,000000010
000000010,1, 1,000000008
000000012,3, 1,000000009
000000009,1, 1,000000011

Code starting point:

import numpy as np
import pandas as pd

T = u'''000000008,2, 1,000000010
    000000009,1, 1,000000011
    000000010,1, 1,000000008
    000000011,2, 1,000000032
    000000012,3, 1,000000009
    000000013,2, 1,000000108'''

from io import StringIO
df = pd.read_csv(StringIO(T), header=None)
print(df)

sacuL · Accepted Answer

IIUC, from your description, you can do:

df[~df.iloc[:,3].isin(df.iloc[:,0])]

Which returns:

    0  1  2    3
3  11  2  1   32
5  13  2  1  108

Contrary to your desired output, this removes the row with 000000011, but not the one with 000000108, because 000000011 is found in both columns, but 000000108 is not

Pandas: Remove Rows Where the Value of One Column Appears on Any Row in Another

Answers (1)

Related Questions