Reputation: 625
I have the following program:
df = pd.DataFrame({
'student':['a'] * 3 + ['b'] * 3 + ['c'] * 4,
'semester':[1, 1, 2, 2, 1, 1, 2, 2, 2, 2],
'passed_exam':[True, False] * 5,
'exam': [
'French', 'English', 'Italian', 'Chinese', 'Russian',
'German', 'Chinese', 'Spanish', 'English', 'French'
]
})
print (df)
passed_exam exam semester student
0 True French 1 a
1 False English 1 a
2 True Italian 2 a
3 False Chinese 2 b
4 True Russian 1 b
5 False German 1 b
6 True Chinese 2 c
7 False Spanish 2 c
8 True English 2 c
9 False French 2 c
Does anybody know how to find the number of students that each student interacted with (through exams)?
Something like this:
passed_exam exam semester student total_st
0 True French 1 a 1
1 False English 1 a 1
2 True Italian 2 a 1
3 False Chinese 2 b 1
4 True Russian 1 b 1
5 False German 1 b 1
6 True German 2 c 2
7 False Spanish 2 c 2
8 True English 2 c 2
9 False French 2 c 2
Thank you in advance!
Upvotes: 1
Views: 100
Reputation: 210972
IIUC you can do it this way:
In [116]: df['total_st'] = df.groupby(['exam','semester'])['student'].transform('size')
In [117]: df
Out[117]:
passed_exam exam semester student total_st
0 True French 1 a 1
1 False English 1 a 1
2 True Italian 2 a 1
3 False Chinese 2 b 2
4 True Russian 1 b 1
5 False German 1 b 1
6 True Chinese 2 c 2
7 False Spanish 2 c 1
8 True English 2 c 1
9 False French 2 c 1
Upvotes: 1
Reputation: 518
The way I understand your question, you want the 'total_st' column correspond to the number of students a student interracted with for a given exam. For instance, if the exam 'French' had 4 student ('a', 'b', 'c', 'd'), the student 'a' interacted with 3 students. Am I right?
If so, here is a solution. First, let's forget about semesters to simplify the problem, and let's consider the following example:
df = pd.DataFrame({
'student': ['a'] * 3 + ['b'] * 3 + ['c'] * 4,
'exam': [
'Chinese', 'English', 'Spanish', 'Chinese', 'Spanish',
'Spanish', 'Chinese', 'Spanish', 'English', 'Chinese'
],
'passed_exam':[True, False] * 5
})
print(df)
exam passed_exam student
0 Chinese True a
1 English False a
2 Spanish True a
3 Chinese False b
4 Spanish True b
5 Spanish False b
6 Chinese True c
7 Spanish False c
8 English True c
9 Chinese False c
Now, we can compute a Series that associate the number of students for each exam, using groupby:
d = df.groupby(['exam'])['student'].count()
print(d)
exam
Chinese 4
English 2
Spanish 4
Name: student, dtype: int64
We get the number of students a student interacted with by removing 1 to each value:
d = d - 1
Finally, we create 'total_st' column using apply, and assign it to the initial dataframe:
total_st = df.apply(lambda x: d.loc[x['exam']], axis=1)
df = df.assign(total_st=total_st)
print(df)
exam passed_exam student total_st
0 Chinese True a 3
1 English False a 1
2 Spanish True a 3
3 Chinese False b 3
4 Spanish True b 3
5 Spanish False b 3
6 Chinese True c 3
7 Spanish False c 3
8 English True c 1
9 Chinese False c 3
Upvotes: 1
Reputation: 25249
I interpret "number of students that each student interacted with (through exams)" as # of students who were sitting the same exam.
Then, it seems like:
df1 = (df
.groupby(["exam","semester"], as_index=False)["student"].agg("count")
.rename(columns={"student":"total_st"}))
df.merge(df1).sort_values(["semester","student"])
passed_exam exam semester student total_st
0 True French 1 a 1
1 False English 1 a 1
5 True Russian 1 b 1
6 False German 1 b 1
2 True Italian 2 a 1
3 False Chinese 2 b 2
4 True Chinese 2 c 2
7 False Spanish 2 c 1
8 True English 2 c 1
9 False French 2 c 1
Upvotes: 1