Reputation: 2008
Example data is:
000000008,2, 1,000000010
000000009,1, 1,000000011
000000010,1, 1,000000008
000000011,2, 1,000000032
000000012,3, 1,000000009
000000013,2, 1,000000108
You can see that some values in the first column also appear in the fourth column. I want to remove those rows, where the value in the fourth column also appears on any row in the first column.
Therefore, in this example, following rows should be removed:
000000008,2, 1,000000010
000000010,1, 1,000000008
000000012,3, 1,000000009
000000009,1, 1,000000011
Code starting point:
import numpy as np
import pandas as pd
T = u'''000000008,2, 1,000000010
000000009,1, 1,000000011
000000010,1, 1,000000008
000000011,2, 1,000000032
000000012,3, 1,000000009
000000013,2, 1,000000108'''
from io import StringIO
df = pd.read_csv(StringIO(T), header=None)
print(df)
Upvotes: 0
Views: 90
Reputation: 51395
IIUC, from your description, you can do:
df[~df.iloc[:,3].isin(df.iloc[:,0])]
Which returns:
0 1 2 3
3 11 2 1 32
5 13 2 1 108
Contrary to your desired output, this removes the row with 000000011
, but not the one with 000000108
, because 000000011
is found in both columns, but 000000108
is not
Upvotes: 2