Reputation: 2871
I have two data frame as shown below.
user table: Details about the courses and modules attended by each users.
user_id courses. Num_of_course attended_modules Total_Modules
1 [A] 1 {A:[1,2,3,4,5,6]} 6
2 [A,B,C] 3 {A:[8], B:[5], C:[6]} 3
3 [A,B] 2 {A:[2,3,9], B:[10]} 4
4 [A] 1 {A:[3]} 1
5 [B] 1 {B:[5]} 1
6 [A] 1 {A:[3]} 1
7 [B] 1 {B:[5]} 1
8 [A] 1 {A:[4]} 1
Course table: Details about the courses and all the modules in that course and popular modules .
course_id modules #users Popular_modules
A [1,2,3,4,5,6,8,9] 5 [3,2]
B [5,8] 4 [5]
C [6,10] 2 []
From the above tables compare two column attended_modules and modules and suggest unattended modules for each users among the attended courses.
Expected Output:
user_id courses. Recommended_modules
1 [A,B] {A:[8,9]}
2 [A,B,C] {A:[3,2,1], B:[8], C:[10]}
3 [A,B] {A:[1,4,5], B:[5,8]}
4 [A] {A:[2,1,4]}
5 [B] {B:[8]}
6 [A] {A:[2,1,4]}
7 [B] {B:[8]}
8 [A] {A:[3,2,1]}
EDIT:
Added the user_id = 9 (last row)
user_id courses. Num_of_course attended_modules Total_Modules
1 [A] 2 {A:[1,2,3,4,5,6]} 6
2 [A,B,C] 3 {A:[8], B:[5], C:[6]} 3
3 [A,B] 2 {A:[2,3,9], B:[10]} 4
4 [A] 1 {A:[3]} 1
5 [B] 1 {B:[5]} 1
6 [A] 1 {A:[3]} 1
7 [B] 1 {B:[5]} 1
8 [A] 1 {A:[4]} 1
9 [A] 1 {A:[1,2,3,4,5,6,8,9], B:[5]} 8
The output of the above code is
user_id courses. Recommended_modules
1 [A] {A:[8,9]}
2 [A,B,C] {A:[3,2,1], B:[8], C:[10]}
3 [A,B] {A:[1,4,5], B:[5,8]}
4 [A] {A:[2,1,4]}
5 [B] {B:[8]}
6 [A] {A:[2,1,4]}
7 [B] {B:[8]}
8 [A] {A:[3,2,1]}
9 [A,B] {A:[], B:[8]}
Where in the Recommended_modules for user_id = 9, course A has empty list
ie {A:[], B:[8]} expected is {B:[8]}
Expected output:
user_id courses. Recommended_modules
1 [A] {A:[8,9]}
2 [A,B,C] {A:[3,2,1], B:[8], C:[10]}
3 [A,B] {A:[1,4,5], B:[5,8]}
4 [A] {A:[2,1,4]}
5 [B] {B:[8]}
6 [A] {A:[2,1,4]}
7 [B] {B:[8]}
8 [A] {A:[3,2,1]}
9 [A,B] {B:[8]}
Upvotes: 1
Views: 62
Reputation: 862641
Use cunstom function with difference between dictionaries, lastm create new dictionarie with if-else
for empty lists in input is also empty list:
df2 = df2.set_index('course_id')
mo = df2['modules'].to_dict()
#print (mo)
pop = df2['Popular_modules'].to_dict()
#print (pop)
def f(x):
out = {}
for k, v in x.items():
dif = [i for i in mo[k] if i not in v]
a = [i for i in pop[k] if i in dif]
b = [i for i in dif if i not in pop[k]]
c = a + b
out[k] = c[:3]
return out
df1['Recommended_modules'] = df1['attended_modules'].apply(f)
df1 = df1[['user_id','courses.','Recommended_modules']]
print (df1)
user_id courses. Recommended_modules
0 1 [A,B] {'A': [8, 9]}
1 2 [A,B,C] {'A': [3, 2, 1], 'B': [8], 'C': [10]}
2 3 [A,B] {'A': [1, 4, 5], 'B': [5, 8]}
3 4 [A] {'A': [2, 1, 4]}
4 5 [B] {'B': [8]}
5 6 [A] {'A': [2, 1, 4]}
6 7 [B] {'B': [8]}
7 8 [A] {'A': [3, 2, 1]}
EDIT: If empty list are in dictionaries in attended_modules
input column and need remove it in output function is changed:
def f(x):
out = {}
for k, v in x.items():
dif = [i for i in mo[k] if i not in v]
a = [i for i in pop[k] if i in dif]
b = [i for i in dif if i not in pop[k]]
c = a + b
if len(v) > 0:
out[k] = c[:3]
return out
Upvotes: 1