Sheron
Sheron

Reputation: 615

Python interaction between rows and columns of pandas dataframe

I have this dataframe:

print (df)
       exam       student
    0 French        a
    1 English       a
    2 Italian       a
    3 Chinese       b
    4 Russian       b
    5 German        b
    6 Chinese       c
    7 Spanish       c
    8 English       c
    9 French        c

I need to find for each student the number of students that took the same exams as he did.

It should be something like this:

  exam      student   total_st
0 French       a         1
1 English      a         1
2 Italian      a         1
3 Chinese      b         1
4 Russian      b         1
5 German       b         1
6 German       c         2
7 Spanish      c         2
8 English      c         2 
9 French       c         2

Total number for student A is 1 because it has common exams just with one student (in this case with student C).

Total number for student B is 1 because it has common exams just with one student (in this case with student C).

Total number for student C is 2 because it has common exams with both students (with students A and B).

Any ideas?

Thank you in advance!

Upvotes: 0

Views: 735

Answers (1)

akuiper
akuiper

Reputation: 214927

You can calculate a contingency table of exam and student firstly, and then do a cross product to check if there is any overlap of exams between student and count the number of students that have at least one share exam, and map the result to the original student column:

cont_table = pd.crosstab(df.exam, df.student)

# cont_table.T.dot(cont_table) gives a table how many exams each student shared with 
# another student, -1 to exclude the student himself
shared_count = (cont_table.T.dot(cont_table) != 0).sum(0) - 1  
shared_count

#student
#a    1
#b    1
#c    2
#dtype: int64


df['total_st'] = df.student.map(shared_count)
df

enter image description here

Upvotes: 2

Related Questions