Reputation: 753
I'm looking to take my existing DF with a number of columns and perform the following operation:
For each originalRow in the DF
check if another row exists where:
row.col1 = originalRow.col1,
and
row.col2 != originalRow.col2
Is there a way to do this gracefully in Python / Pandas?
I've looked into using the .where operator, but the issue I run into is that my conditions are checking one rows column values against another rows values. (Something like this answer: https://stackoverflow.com/a/43481338/3757782, but for two different rows)
Something like:
df["New Col"] = np.where(((df["col1"] == df["col1*"]) && (df["col2"] != df["col2*"])), 1, 0)
Except that col1* has to be another row from col1, etc. Let me know if this question doesn't make sense. I'm hoping that this is something that isn't that hard, and I'm just missing what the standard way to do this is.
Thanks!
Example Data:
df =
Col1, Col2
a, 1
a, 2
b, 1
b, 1
c, 1
c, 2
c, 3
d, 1
Expected Output:
df =
Col1, Col2, newCol
a, 1, 1
a, 2, 1
b, 1, 0
b, 1, 0
c, 1, 1
c, 2, 1
c, 3, 1
d, 1, 0
the two A rows get 1 (true) because another row exists for each of those where col1 = col1* and col2 != col2*
the two B rows get 0 (false) because they don't meet the condition
the three C rows get 1 (true) for the same reason as the A rows
and D gets 0 (false) as no other D row exists
Upvotes: 0
Views: 87
Reputation: 26676
Lets try
df['newCol']=np.where((df.Col1.eq(df.Col1.shift(-1))|df.Col1.shift(1).eq(df.Col1))&df.Col2.ne(df.Col2.shift(-1)),1,0)
Col1 Col2 newCol
0 a 1 1
1 a 2 1
2 b 1 0
3 b 1 0
4 c 1 1
5 c 2 1
6 c 3 1
7 d 1 0
Upvotes: 1
Reputation: 2534
How about this ?
cols = df.columns.tolist()
df['COUNT'] = 1
count = df.groupby(cols)['COUNT'].sum()
ix = df[df['COUNT']>1].index
count.loc[ix, 'COUNT'] = 0
df = df.merge(count, on=cols, how='left')
EDIT :
cols = ['col1', 'col2']
df['COUNT'] = 1
count = df.groupby(cols)['COUNT'].sum()
count = count.reset_index(drop=False)
ix = df[df['COUNT']>1].index
count.loc[ix, 'COUNT'] = 0
df = df.merge(count, on=cols, how='left')
count_col1 = df[['col1', 'COUNT']].copy()
count_col1 = df[['col1', 'COUNT']].groupby("col1")['COUNT'].sum()
count_col1 = count_col1.reset_index(drop=False)
count_col1.rename({'COUNT':'COUNT_COL1'}, axis=1, inplace=True)
ix = count_col1[count_col1.COUNT_COL1>1].index
count_col1.drop(ix, inplace=True)
df = df.merge(count_col1, on='col1', how='left')
ix = df[df.COUNT_COL1.notnull()].index
df.loc[ix, 'COUNT'] = 0
df.drop("COUNT_COL1", inplace=True, axis=1)
Upvotes: 1