Reputation: 3826
I've got a dataframe, let's say it is defined like this:
df = pd.DataFrame({
'variable' : ["A","A","B","B","A","B","B","A"],
'id1' : [1,2,3,4,5,6,7,8],
'id2' : ['NaN','NaN',2,'NaN',4,3,'NaN',5]
})
So the result is:
id1 id2 variable1
0 1 NaN A
1 2 NaN A
2 3 2 B
3 4 NaN B
4 5 4 A
5 6 3 B
6 7 NaN B
7 8 5 A
Now what I would like to achieve is:
Match rows where id2
in one row is equal to id1
in another row. So in this particular example, paired rows should be:
(2, 1) - because df.iloc[2]['id2'] == df.iloc[1]['id1']
and subsequently: (4, 3), (5, 2), (7, 4)
From these pairs, choose only those where variable1
is constant within a pair, so from the pairs above we will have left only (5, 2) and (7, 4) because only in these cases we have:
df.iloc[5]['variable1'] == df.iloc[2]['variable1']
(example is valid for the first pair of course).
Now, if this were pure Python, I would probably iterate over two lists, containing id1
and id2
, checking equalities of ids and variable
, but I suppose there should be a more array-oriented method...
Perhaps creating a duplicate dataframe and doing an intersection of id1
and id2
will do? Or should I consider a totally different idea?
Upvotes: 3
Views: 4435
Reputation: 32298
How about this?
# Add index column:
df = df.reset_index()
df2 = pd.merge(
df, df
left_on="id1", right_on="id2"
).query("variable_x == variable_y")
list(df2[["index_x", "index_y"]].itertuples(index=False))
Upvotes: 1
Reputation: 78041
maybe:
>>> pd.merge(df[['id2', 'variable']].reset_index(),
... df.reset_index(), how='inner',
... left_on=['id2', 'variable'],
... right_on=['id1', 'variable'])[['index_x', 'index_y']]
index_x index_y
0 5 2
1 7 4
[2 rows x 2 columns]
Upvotes: 2