Pandas Flag Rows with Complementary Zeros

Question

Given the following data frame:

import pandas as pd
df=pd.DataFrame({'A':[0,4,4,4],
                 'B':[0,4,4,0],
                 'C':[0,4,4,4],
                 'D':[4,0,0,4],
                 'E':[4,0,0,0],
                 'Name':['a','a','b','c']})
df
    A   B   C   D   E   Name
0   0   0   0   4   4   a
1   4   4   4   0   0   a
2   4   4   4   0   0   b
3   4   0   4   4   0   c

I'd like to add a new field called "Match_Flag" which labels unique combinations of rows if they have complementary zero patterns (as with rows 0, 1, and 2) AND have the same name (just for rows 0 and 1). It uses the name of the rows that match.

The desired result is as follows:

    A   B   C   D   E   Name  Match_Flag
0   0   0   0   4   4   a     a
1   4   4   4   0   0   a     a
2   4   4   4   0   0   b     NaN
3   4   0   4   4   0   c     NaN

Caveat: The patterns may vary, but should still be complementary.

Thanks in advance!

UPDATE

Sorry for the confusion. Here is some clarification:

The reason why rows 0 and 1 are "complementary" is that they have opposite patterns of zeros in their columns; 0,0,0,4,4 vs, 4,4,4,0,0. The number 4 is arbitrary; it could just as easily be 0,0,0,4,2 and 65,770,23,0,0. So if 2 such rows are indeed complementary and they have the same name, I'd like for them to be flagged with that same name under the "Match_Flag" column.

piRSquared · Accepted Answer

You can identify a compliment if it's dot product is zero and it's element wise sum is nowhere zero.

def complements(df):
    v = df.drop('Name', axis=1).values
    n = v.shape[0]
    row, col = np.triu_indices(n, 1)

    # ensure two rows are complete
    # their sum contains no zeros
    c = ((v[row] + v[col]) != 0).all(1)
    complete = set(row[c]).union(col[c])

    # ensure two rows do not overlap
    # their product is zero everywhere
    o = (v[row] * v[col] == 0).all(1)
    non_overlap = set(row[o]).union(col[o])

    # we are a compliment iff we do
    # not overlap and we are complete
    complement = list(non_overlap.intersection(complete))

    # return slice
    return df.Name.iloc[complement]

Then groupby('Name') and apply our function

df['Match_Flag'] = df.groupby('Name', group_keys=False).apply(complements)

Pandas Flag Rows with Complementary Zeros

Answers (1)

Related Questions