Danish
Danish

Reputation: 2871

compare two column in two table in pandas based on the specific condition

I have two data frame as shown below.

user table: Details about the courses and modules attended by each users.

user_id     courses.  Num_of_course     attended_modules              Total_Modules
1           [A]       1                 {A:[1,2,3,4,5,6]}             6
2           [A,B,C]   3                 {A:[8], B:[5], C:[6]}         3 
3           [A,B]     2                 {A:[2,3,9], B:[10]}           4
4           [A]       1                 {A:[3]}                       1
5           [B]       1                 {B:[5]}                       1
6           [A]       1                 {A:[3]}                       1
7           [B]       1                 {B:[5]}                       1
8           [A]       1                 {A:[4]}                       1

Course table: Details about the courses and all the modules in that course and popular modules .

course_id         modules                 #users        Popular_modules
A                 [1,2,3,4,5,6,8,9]       5             [3,2]
B                 [5,8]                   4             [5]
C                 [6,10]                  2             []

From the above tables compare two column attended_modules and modules and suggest unattended modules for each users among the attended courses.

  1. Suggest the popular module in the attended course if the user not already attended that module.
  2. Suggest the unattended module in the ongoing courses in the order as in the course table.
  3. Suggest 3 modules to all users from each courses he has attended, if 3 modules available in that course.

Expected Output:

user_id         courses.       Recommended_modules              
    1           [A,B]          {A:[8,9]}    
    2           [A,B,C]        {A:[3,2,1], B:[8], C:[10]}          
    3           [A,B]          {A:[1,4,5], B:[5,8]}           
    4           [A]            {A:[2,1,4]}                       
    5           [B]            {B:[8]}                       
    6           [A]            {A:[2,1,4]}                       
    7           [B]            {B:[8]}                       
    8           [A]            {A:[3,2,1]}

EDIT:

Added the user_id = 9 (last row)

user_id     courses.  Num_of_course     attended_modules              Total_Modules
1           [A]       2                 {A:[1,2,3,4,5,6]}             6
2           [A,B,C]   3                 {A:[8], B:[5], C:[6]}         3 
3           [A,B]     2                 {A:[2,3,9], B:[10]}           4
4           [A]       1                 {A:[3]}                       1
5           [B]       1                 {B:[5]}                       1
6           [A]       1                 {A:[3]}                       1
7           [B]       1                 {B:[5]}                       1
8           [A]       1                 {A:[4]}                       1
9           [A]       1                 {A:[1,2,3,4,5,6,8,9], B:[5]}  8 

The output of the above code is

     user_id        courses.       Recommended_modules              
        1           [A]            {A:[8,9]}    
        2           [A,B,C]        {A:[3,2,1], B:[8], C:[10]}          
        3           [A,B]          {A:[1,4,5], B:[5,8]}           
        4           [A]            {A:[2,1,4]}                       
        5           [B]            {B:[8]}                       
        6           [A]            {A:[2,1,4]}                       
        7           [B]            {B:[8]}                       
        8           [A]            {A:[3,2,1]}
        9           [A,B]          {A:[], B:[8]} 

Where in the Recommended_modules for user_id = 9, course A has empty list

ie {A:[], B:[8]} expected is {B:[8]}

Expected output:

         user_id        courses.       Recommended_modules              
            1           [A]            {A:[8,9]}    
            2           [A,B,C]        {A:[3,2,1], B:[8], C:[10]}          
            3           [A,B]          {A:[1,4,5], B:[5,8]}           
            4           [A]            {A:[2,1,4]}                       
            5           [B]            {B:[8]}                       
            6           [A]            {A:[2,1,4]}                       
            7           [B]            {B:[8]}                       
            8           [A]            {A:[3,2,1]}
            9           [A,B]          {B:[8]}

Upvotes: 1

Views: 62

Answers (1)

jezrael
jezrael

Reputation: 862641

Use cunstom function with difference between dictionaries, lastm create new dictionarie with if-else for empty lists in input is also empty list:

df2 = df2.set_index('course_id')
mo = df2['modules'].to_dict()
#print (mo)

pop = df2['Popular_modules'].to_dict()
#print (pop)

def f(x):
    out = {}
    for k, v in x.items():
        dif = [i for i in mo[k] if i not in v]
        a = [i for i in pop[k] if i in dif]
        b = [i for i in dif if i not in pop[k]]
        c = a + b
        out[k] = c[:3]    
    return out

df1['Recommended_modules'] = df1['attended_modules'].apply(f)
df1 = df1[['user_id','courses.','Recommended_modules']]
print (df1)
   user_id courses.           Recommended_modules
0        1    [A,B]           {'A': [8, 9]}
1        2  [A,B,C]           {'A': [3, 2, 1], 'B': [8], 'C': [10]}
2        3    [A,B]           {'A': [1, 4, 5], 'B': [5, 8]}
3        4      [A]           {'A': [2, 1, 4]}
4        5      [B]           {'B': [8]}
5        6      [A]           {'A': [2, 1, 4]}
6        7      [B]           {'B': [8]}
7        8      [A]           {'A': [3, 2, 1]}

EDIT: If empty list are in dictionaries in attended_modules input column and need remove it in output function is changed:

def f(x):
    out = {}
    for k, v in x.items():
        dif = [i for i in mo[k] if i not in v]
        a = [i for i in pop[k] if i in dif]
        b = [i for i in dif if i not in pop[k]]
        c = a + b
        if len(v) > 0:
            out[k] = c[:3]    
    return out

Upvotes: 1

Related Questions