Reputation: 13
I'm Finalizing some results of some association rule mining using Python Pandas. I'm using mlxtnd with association rules and frequent_items. I am able to successfully mine rules, however now I'm in the process of trying to make the results human readable.
Given an already created dictionary:
{
66: "Course Name 1",
72: 'Course Name 2',
83: 'Course Name 3',
84: 'Course Name Etc'
}
My objective is to be able to take the column of itemsets
returned form mining results:
support itemsets
38 0.020280 (66,72)
39 0.016900 (72,83)
40 0.014969 (72,84)
41 0.013037 (83,84)
42 0.013037 (66,83,84)
which can be obtained from:
df = pd.DataFrame({'support': {38: 0.02028, 39: 0.0169, 40: 0.014969, 41: 0.013037, 42: 0.013037},
'itemsets': {38: (66, 72), 39: (72, 83), 40: (72, 84), 41: (83, 84), 42: (66, 83, 84)}})
And map through them adding a third column of corresponding names that match the keys in the dictionary. Expected output:
support itemsets fullcoursenames
38 0.020280 (66,67) 'Course Name 1', 'Course Name 2'
39 0.016900 (72,83) 'Course Name 2', 'Course Name 3'
40 0.014969 (72,84) 'Course Name 2', 'Course Name etc'
41 0.013037 (83,84) 'Course Name 3', 'Course Name etc'
42 0.013037 (66,83,84) 'Course Name 1', 'Course Name 2','Course Name etc'
Don't know if I need to be creating a new function definition or attempting with lambdas. But also, don't know syntax for iterating through each list item in the itemsets
rows.
Upvotes: 1
Views: 1275
Reputation: 13407
Assuming your mapping dict
is stored as course_name_mapping
:
df = df.explode("itemsets")
df["fullcoursename"] = df["itemsets"].map(course_name_mapping)
df = df.groupby(level=0, as_index=False).agg(list)
Upvotes: 1
Reputation:
Given the mapper dictionary:
mapper = {66: 'Course Name 1', 72: 'Course Name 2', 83: 'Course Name 3', 84: 'Course Name Etc'}
You can apply a function that gets values from the dictionary mapper
to the 'itemsets'
column:
df['fullcoursenames'] = df['itemsets'].apply(lambda tpl: [mapper.get(x) for x in tpl])
You can do the very same job using a list comprehension that iterates over the values of df['itemsets']
as well:
df['fullcoursenames'] = [[mapper.get(x) for x in tpl] for tpl in df['itemsets']]
Output:
support itemsets fullcoursenames
38 0.020280 (66, 72) [Course Name 1, Course Name 2]
39 0.016900 (72, 83) [Course Name 2, Course Name 3]
40 0.014969 (72, 84) [Course Name 2, Course Name Etc]
41 0.013037 (83, 84) [Course Name 3, Course Name Etc]
42 0.013037 (66, 83, 84) [Course Name 1, Course Name 3, Course Name Etc]
If you want to join the full course names, you can also use join
method in the lambda above (although I assume this is not something you want):
df['fullcoursenames'] = df['itemsets'].apply(lambda tpl: ', '.join(mapper.get(x) for x in tpl))
Output:
support itemsets fullcoursenames
38 0.020280 (66, 72) Course Name 1, Course Name 2
39 0.016900 (72, 83) Course Name 2, Course Name 3
40 0.014969 (72, 84) Course Name 2, Course Name Etc
41 0.013037 (83, 84) Course Name 3, Course Name Etc
42 0.013037 (66, 83, 84) Course Name 1, Course Name 3, Course Name Etc
Upvotes: 2