Thomas Matthew
Thomas Matthew

Reputation: 2886

Intersection between values in a pandas dataframe

Problem Statement:

Create new pandas dataframe column showing a boolean of either 1 (intersection) or 0 (no intersection) of row values in two different columns: row_mods and col_mods. Another column is added to show what those overlap(s) is (are). As in the example below, intersect takes boolean values, and common shows the intersecting value(s).

The rendered pandas dataframe is what I have, the drawn portion is what I'm looking for:

enter image description here

Setup:

# data
n = np.nan
congruent = pd.DataFrame.from_dict(  
         {'row': ['x','a','b','c','d','e','y'],
            'x': [ n,  5,   5,  5,  5,  5, 5],
            'a': [ 5, n, -.8,-.6,-.3, .8, .01],
            'b': [ 5,-.8,  n, .5, .7,-.9, .01],
            'c': [ 5,-.6, .5,  n, .3, .1, .01],
            'd': [ 5,-.3, .7, .3,  n, .2, .01],
            'e': [ 5, .8,-.9, .1, .2,  n, .01],
            'y': [ 5, .01, .01, .01, .01,  .01, n],
       }).set_index('row')
congruent.columns.names = ['col']
memberships = {'a':['vowel'], 'b':['consonant'], 'c':['consonant'], 'd':['consonant'], 'e':['vowel'], 'y':['consonant', 'vowel'], '*':['wildcard']}

# format stacked df
cs = congruent.stack().to_frame()
cs.columns = ['score']
cs.reset_index(inplace=True)
cs.columns = ['row', 'col', 'score']

# filter col entries not found in membership dict keys
cs['elim'] = (cs['row'].isin(memberships.keys())) & (cs['col'].isin(memberships.keys()))
cs_2 = cs[cs['elim'] == True]

# map col entires to membership dict values
cs_2['row_mods'] = cs_2['row'].map(memberships)
cs_2['col_mods'] = cs_2['col'].map(memberships)

How can I perform an intersection across two values in a row across two different columns?

Upvotes: 0

Views: 3933

Answers (2)

epattaro
epattaro

Reputation: 2438

try this mate:

step1, define function:

def check_row (row_mods, col_mods):

    common = []

    intersect = 0

    for x in col_mods:

        if x in row_mods:

            intersect = 1
            common.append(x)

    if (intersect == 0):

        common.append(np.nan)

    return (intersect, common)

step 2, apply function:

cs_2['intersect'] = ''
cs_2['common'] = ''

for index in cs_2.index:

    (intersect, common) = check_row(cs_2.loc[index,'row_mods'], cs_2.loc[index,'col_mods'])

    cs_2.loc[index,'intersect'] = intersect
    cs_2.loc[index,'common'] = [x for x in common]

hope it helps! if it does upvote/check answer :)

Upvotes: 1

Prune
Prune

Reputation: 77857

Since you're apparently comfortable with the PANDAS operations, I'll supply just the Python intersection logic:

common = list(set(row_mods).intersection(set(col_mods)))
intersect = len(common) > 0

Briefly, you turn each list of mods into a set, and then use the Python built-in intersection method. Turn the result back into a list.

Does that solve your problem?

Upvotes: 2

Related Questions