mrE_dev
mrE_dev

Reputation: 13

Using Python Pandas how to map a column of list values (of id numbers) to a new column of list values (corresponding to names from dictionary list)

I'm Finalizing some results of some association rule mining using Python Pandas. I'm using mlxtnd with association rules and frequent_items. I am able to successfully mine rules, however now I'm in the process of trying to make the results human readable.

Given an already created dictionary:

{
66: "Course Name 1",
 72: 'Course Name 2',
 83: 'Course Name 3',
 84: 'Course Name Etc'
}

My objective is to be able to take the column of itemsets returned form mining results:

    support         itemsets     
38  0.020280    (66,72)         
39  0.016900    (72,83)         
40  0.014969    (72,84)         
41  0.013037    (83,84)         
42  0.013037    (66,83,84)

which can be obtained from:

df = pd.DataFrame({'support': {38: 0.02028, 39: 0.0169, 40: 0.014969, 41: 0.013037, 42: 0.013037}, 
                   'itemsets': {38: (66, 72), 39: (72, 83), 40: (72, 84), 41: (83, 84), 42: (66, 83, 84)}})

And map through them adding a third column of corresponding names that match the keys in the dictionary. Expected output:

    support     itemsets        fullcoursenames
38  0.020280    (66,67)         'Course Name 1', 'Course Name 2'
39  0.016900    (72,83)         'Course Name 2', 'Course Name 3'
40  0.014969    (72,84)         'Course Name 2', 'Course Name etc'
41  0.013037    (83,84)         'Course Name 3', 'Course Name etc'
42  0.013037    (66,83,84)      'Course Name 1', 'Course Name 2','Course Name etc'

Don't know if I need to be creating a new function definition or attempting with lambdas. But also, don't know syntax for iterating through each list item in the itemsets rows.

Upvotes: 1

Views: 1275

Answers (2)

Georgina Skibinski
Georgina Skibinski

Reputation: 13407

Assuming your mapping dict is stored as course_name_mapping:

df = df.explode("itemsets")
df["fullcoursename"] = df["itemsets"].map(course_name_mapping)
df = df.groupby(level=0, as_index=False).agg(list)

Upvotes: 1

user7864386
user7864386

Reputation:

Given the mapper dictionary:

mapper = {66: 'Course Name 1', 72: 'Course Name 2', 83: 'Course Name 3', 84: 'Course Name Etc'}

You can apply a function that gets values from the dictionary mapper to the 'itemsets' column:

df['fullcoursenames'] = df['itemsets'].apply(lambda tpl: [mapper.get(x) for x in tpl])

You can do the very same job using a list comprehension that iterates over the values of df['itemsets'] as well:

df['fullcoursenames'] = [[mapper.get(x) for x in tpl] for tpl in df['itemsets']]

Output:

     support      itemsets                                  fullcoursenames
38  0.020280      (66, 72)                   [Course Name 1, Course Name 2]
39  0.016900      (72, 83)                   [Course Name 2, Course Name 3]
40  0.014969      (72, 84)                 [Course Name 2, Course Name Etc]
41  0.013037      (83, 84)                 [Course Name 3, Course Name Etc]
42  0.013037  (66, 83, 84)  [Course Name 1, Course Name 3, Course Name Etc]

If you want to join the full course names, you can also use join method in the lambda above (although I assume this is not something you want):

df['fullcoursenames'] = df['itemsets'].apply(lambda tpl: ', '.join(mapper.get(x) for x in tpl))

Output:

     support      itemsets                                fullcoursenames
38  0.020280      (66, 72)                   Course Name 1, Course Name 2
39  0.016900      (72, 83)                   Course Name 2, Course Name 3
40  0.014969      (72, 84)                 Course Name 2, Course Name Etc
41  0.013037      (83, 84)                 Course Name 3, Course Name Etc
42  0.013037  (66, 83, 84)  Course Name 1, Course Name 3, Course Name Etc

Upvotes: 2

Related Questions