Sheron
Sheron

Reputation: 615

Python interaction in dataframe

I have the following dataframe:

   exam_id   student  semester
0     01        a        1
1     02        b        2
2     03        c        3
3     01        d        1
4     02        e        2
5     03        f        3
6     01        g        1

I would like to create a new dataframe containing four columns: "student", "shared exam with", "semester", "number of shared exams".

       student shared_exam_with  semester number_of_shared_exam
    0     a        d                1             1
    1     a        g                1             1
    2     b        e                2             1
    3     c        f                3             1
    4     d        a                1             1
    5     d        g                1             1
    6     e        b                2             1
    7     f        c                3             1 
    8     g        a                1             1
    9     g        d                1             1

Any suggestion?

Upvotes: 0

Views: 231

Answers (1)

piRSquared
piRSquared

Reputation: 294218

idx_cols = ['exam_id', 'semester']
std_cols = ['student_x', 'student_y']
d1 = df.merge(df, on=idx_cols)
d2 = d1.loc[d1.student_x != d1.student_y, idx_cols + std_cols]

d2.loc[:, std_cols] = np.sort(d2.loc[:, std_cols])

d3 = d2.drop_duplicates().groupby(
    std_cols + ['semester']).size().reset_index(name='count')

print(d3)

  student_x student_y semester  count
0         a         d        1      1
1         a         g        1      1
2         b         e        2      1
3         c         f        3      1
4         d         g        1      1

how it works

  • self merge on just semester and exam_id
  • get rid of self sharing
  • sort each row of student pairs so that we can see duplicate combinations
  • drop those duplicates
  • group by students (include semester so we see it in result)

Upvotes: 2

Related Questions