Keep track of elements of a dataframe

Question

This question is the extension of this one: How to keep track of element from another dataframe in pandas. Suppose I have the following dataframe:

df = pd.DataFrame({"call 1": ['debit card','bond',np.nan],
                  "call 2": ['debit card','mortgage','spending limit'],
                  "call 3":['payment limit','debit card',np.nan]})

which is:

       call 1          call 2         call 3
0  debit card      debit card  payment limit
1        bond        mortgage     debit card
2         NaN  spending limit            NaN

I further do some clustering and I get the next dataframe:

dfc = pd.DataFrame( {'cluster 1': ['payment limit', 'spending limit',np.nan],
 'cluster 2': ['debit card', 'debit card', 'debit card'],
 'cluster 3': [ 'bond', 'mortgage',np.nan]})

which is

        cluster 1   cluster 2 cluster 3
0   payment limit  debit card      bond
1  spending limit  debit card  mortgage
2             NaN  debit card       NaN

Now I want to keep track of the elements of the clustering table. Namely, I want to know which elements comes from which call, the desired result should look something like:

    cluster 1                cluster 2                 cluster 3
0   {call 3, payment limit}  {call 1, debit card}      {call 1, bond}
1  {call 2, pending limit}  {call 2, debit card}    {call 2, mortgage}
2             NaN           {call 3, debit card}         NaN

Using the methods in the previous question would fail, firstly because in many calls (i.e. 1,2 and 3) the elements are not distinct. Secondly,

dfc.applymap(lambda x: df[df.eq(x)].dropna(how='all').dropna(axis=1).to_dict('records')[0])

would work if we had 2 debit cards, for anything above 2 I get IndexError: list index out of range simply because the method above tries to make a dictionary.

Update: Using

lookup_dict = {}
look_df = df.T

for col in look_df.columns:
    lookup_dict.update(dict(zip(look_df[col], look_df.index)))
    
pd.concat([dfc.replace(lookup_dict), dfc]).astype(str).groupby(level=0).agg(lambda x: dict([tuple(x)]))

would also fail, as all debit cards are counted to be from call 3 which is wrong. In fact, the above gives:

cluster 1   cluster 2   cluster 3
0   {'call 3': 'payment limit'} {'call 3': 'debit card'}    {'call 1': 'bond'}
1   {'call 2': 'spending limit'}    {'call 3': 'debit card'}    {'call 2': 'mortgage'}
2   {'call 3': 'nan'}   {'call 3': 'debit card'}    {'call 3': 'nan'}

Keep track of elements of a dataframe

Answers (1)

Related Questions