Reputation: 3423
My end goal is to create a Force-Directed graph with d3 that shows clusters of users that utilize certain features in my applications. To do this, I need to create a set of "links" that have the following format (taken from the above link):
{"source": "Napoleon", "target": "Myriel", "value": 1}
To get to this step though, I start with a pandas dataframe that looks like this. How can I generate a list of permutations of APP_NAME
/FEAT_ID
combinations for each USER_ID
?
APP_NAME FEAT_ID USER_ID CNT
280 app1 feature1 user1 114
2622 app2 feature2 user1 8
1698 app2 feature3 user1 15
184 app3 feature4 user1 157
2879 app2 feature5 user1 7
3579 app2 feature6 user1 5
232 app2 feature7 user1 136
295 app2 feature8 user1 111
2620 app2 feature9 user1 8
2047 app3 feature10 user2 11
3395 app2 feature2 user2 5
3044 app2 feature11 user2 6
3400 app2 feature12 user2 5
Expected Results:
Based on the above dataframe, I'd expect user1
and user2
to generate the following permutations
user1:
app1-feature1 -> app2-feature2, app2-feature3, app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
app2-feature2 -> app2-feature3, app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
app2-feature3 -> app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
app3-feature4 -> app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
app2-feature5 -> app2-feature6, app2-feature7, app2-feature8, app2-feature9
app2-feature6 -> app2-feature7, app2-feature8, app2-feature9
app2-feature7 -> app2-feature8, app2-feature9
app2-feature8 -> app2-feature9
user2:
app3-feature10 -> app2-feature2, app2-feature11, app2-feature12
app2-feature2 -> app2-feature11, app2-feature12
app2-feature11 -> app2-feature12
From this, I'd expect to be able to generate the expected inputs to D3, which would look like this for user2
.
{"source": "app3-feature10", "target": "app2-feature2"}
{"source": "app3-feature10", "target": "app2-feature11"}
{"source": "app3-feature10", "target": "app2-feature12"}
{"source": "app2-feature2", "target": "app2-feature11"}
{"source": "app2-feature2", "target": "app2-feature12"}
{"source": "app2-feature11", "target": "app2-feature12"}
How can I generate a list of permutations of APP_NAME
/FEAT_ID
combinations for each USER_ID
in my dataframe?
Upvotes: 1
Views: 597
Reputation: 637
I would look at making some tuples out of your dataframe and then using something like itertools.permutations
to create all the permutations, and then from there, craft your dictionaries as you need:
import itertools
allUserPermutations = {}
groupedByUser = df.groupby('USER_ID')
for k, g in groupedByUser:
requisiteColumns = g[['APP_NAME', 'FEAT_ID']]
# tuples out of dataframe rows
userCombos = [tuple(x) for x in requisiteColumns.values]
# this is a generator obj
userPermutations = itertools.permutations(userCombos, 2)
# create a list of specified dicts for the current user
userPermutations = [{'source': s, 'target': tar for s, tar in userPermutations]
# store the current users specified dicts
allUserPermutations[k] = userPermutations
If the permutations don't return the desired behavior, you could try some other combinatoric generators found here. Hopefully, this kind of strategy works (I don't have a pandas-enabled REPL to test it, at the moment). Best of luck!
Upvotes: 1