Intersection between values in a pandas dataframe

Question

Problem Statement:

Create new pandas dataframe column showing a boolean of either 1 (intersection) or 0 (no intersection) of row values in two different columns: row_mods and col_mods. Another column is added to show what those overlap(s) is (are). As in the example below, intersect takes boolean values, and common shows the intersecting value(s).

The rendered pandas dataframe is what I have, the drawn portion is what I'm looking for:

Setup:

# data
n = np.nan
congruent = pd.DataFrame.from_dict(  
         {'row': ['x','a','b','c','d','e','y'],
            'x': [ n,  5,   5,  5,  5,  5, 5],
            'a': [ 5, n, -.8,-.6,-.3, .8, .01],
            'b': [ 5,-.8,  n, .5, .7,-.9, .01],
            'c': [ 5,-.6, .5,  n, .3, .1, .01],
            'd': [ 5,-.3, .7, .3,  n, .2, .01],
            'e': [ 5, .8,-.9, .1, .2,  n, .01],
            'y': [ 5, .01, .01, .01, .01,  .01, n],
       }).set_index('row')
congruent.columns.names = ['col']
memberships = {'a':['vowel'], 'b':['consonant'], 'c':['consonant'], 'd':['consonant'], 'e':['vowel'], 'y':['consonant', 'vowel'], '*':['wildcard']}

# format stacked df
cs = congruent.stack().to_frame()
cs.columns = ['score']
cs.reset_index(inplace=True)
cs.columns = ['row', 'col', 'score']

# filter col entries not found in membership dict keys
cs['elim'] = (cs['row'].isin(memberships.keys())) & (cs['col'].isin(memberships.keys()))
cs_2 = cs[cs['elim'] == True]

# map col entires to membership dict values
cs_2['row_mods'] = cs_2['row'].map(memberships)
cs_2['col_mods'] = cs_2['col'].map(memberships)

How can I perform an intersection across two values in a row across two different columns?

epattaro · Accepted Answer

try this mate:

step1, define function:

def check_row (row_mods, col_mods):

    common = []

    intersect = 0

    for x in col_mods:

        if x in row_mods:

            intersect = 1
            common.append(x)

    if (intersect == 0):

        common.append(np.nan)

    return (intersect, common)

step 2, apply function:

cs_2['intersect'] = ''
cs_2['common'] = ''

for index in cs_2.index:

    (intersect, common) = check_row(cs_2.loc[index,'row_mods'], cs_2.loc[index,'col_mods'])

    cs_2.loc[index,'intersect'] = intersect
    cs_2.loc[index,'common'] = [x for x in common]

hope it helps! if it does upvote/check answer :)

Intersection between values in a pandas dataframe

Problem Statement:

Setup:

Answers (2)

Related Questions