NewGuy
NewGuy

Reputation: 3423

How can I get a list of all permutations of two column combination based on another column's value?

My end goal is to create a Force-Directed graph with d3 that shows clusters of users that utilize certain features in my applications. To do this, I need to create a set of "links" that have the following format (taken from the above link):

{"source": "Napoleon", "target": "Myriel", "value": 1}

To get to this step though, I start with a pandas dataframe that looks like this. How can I generate a list of permutations of APP_NAME/FEAT_ID combinations for each USER_ID?

        APP_NAME      FEAT_ID   USER_ID  CNT  
280     app1          feature1  user1    114  
2622    app2          feature2  user1    8  
1698    app2          feature3  user1    15  
184     app3          feature4  user1    157  
2879    app2          feature5  user1    7  
3579    app2          feature6  user1    5  
232     app2          feature7  user1    136  
295     app2          feature8  user1    111  
2620    app2          feature9  user1    8  
2047    app3         feature10  user2    11  
3395    app2          feature2  user2    5  
3044    app2         feature11  user2    6  
3400    app2         feature12  user2    5  

Expected Results:

Based on the above dataframe, I'd expect user1 and user2 to generate the following permutations

user1:
    app1-feature1 -> app2-feature2, app2-feature3, app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
    app2-feature2 -> app2-feature3, app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
    app2-feature3 -> app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
    app3-feature4 -> app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
    app2-feature5 -> app2-feature6, app2-feature7, app2-feature8, app2-feature9
    app2-feature6 -> app2-feature7, app2-feature8, app2-feature9
    app2-feature7 -> app2-feature8, app2-feature9
    app2-feature8 -> app2-feature9

user2:
    app3-feature10 -> app2-feature2, app2-feature11, app2-feature12
    app2-feature2  -> app2-feature11, app2-feature12
    app2-feature11 -> app2-feature12

From this, I'd expect to be able to generate the expected inputs to D3, which would look like this for user2.

{"source": "app3-feature10", "target": "app2-feature2"}
{"source": "app3-feature10", "target": "app2-feature11"}
{"source": "app3-feature10", "target": "app2-feature12"}
{"source": "app2-feature2", "target": "app2-feature11"}
{"source": "app2-feature2", "target": "app2-feature12"}
{"source": "app2-feature11", "target": "app2-feature12"}

How can I generate a list of permutations of APP_NAME/FEAT_ID combinations for each USER_ID in my dataframe?

Upvotes: 1

Views: 597

Answers (1)

StarFox
StarFox

Reputation: 637

I would look at making some tuples out of your dataframe and then using something like itertools.permutations to create all the permutations, and then from there, craft your dictionaries as you need:

import itertools

allUserPermutations = {}

groupedByUser = df.groupby('USER_ID')
for k, g in groupedByUser:

    requisiteColumns = g[['APP_NAME', 'FEAT_ID']]

    # tuples out of dataframe rows
    userCombos = [tuple(x) for x in requisiteColumns.values]

    # this is a generator obj
    userPermutations = itertools.permutations(userCombos, 2)

    # create a list of specified dicts for the current user
    userPermutations = [{'source': s, 'target': tar for s, tar in userPermutations]

    # store the current users specified dicts
    allUserPermutations[k] = userPermutations 

If the permutations don't return the desired behavior, you could try some other combinatoric generators found here. Hopefully, this kind of strategy works (I don't have a pandas-enabled REPL to test it, at the moment). Best of luck!

Upvotes: 1

Related Questions