Sheron
Sheron

Reputation: 625

Python interaction between rows and columns

I have the following program:

df = pd.DataFrame({
    'student':['a'] * 3 + ['b'] * 3 + ['c'] * 4,
    'semester':[1, 1, 2, 2, 1, 1, 2, 2, 2, 2],
    'passed_exam':[True, False] * 5,
    'exam': [
        'French', 'English', 'Italian', 'Chinese', 'Russian',
        'German', 'Chinese', 'Spanish', 'English', 'French'
    ]
})

print (df)

      passed_exam   exam      semester student
    0        True   French          1       a
    1       False   English         1       a
    2        True   Italian         2       a
    3       False   Chinese         2       b
    4        True   Russian         1       b
    5       False   German          1       b
    6        True   Chinese         2       c
    7       False   Spanish         2       c
    8        True   English         2       c
    9       False   French          2       c

Does anybody know how to find the number of students that each student interacted with (through exams)?

Something like this:

      passed_exam   exam      semester student   total_st
    0        True   French          1       a         1
    1       False   English         1       a         1
    2        True   Italian         2       a         1
    3       False   Chinese         2       b         1
    4        True   Russian         1       b         1
    5       False   German          1       b         1
    6        True   German          2       c         2
    7       False   Spanish         2       c         2
    8        True   English         2       c         2 
    9       False   French          2       c         2

Thank you in advance!

Upvotes: 1

Views: 100

Answers (3)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210972

IIUC you can do it this way:

In [116]: df['total_st'] = df.groupby(['exam','semester'])['student'].transform('size')

In [117]: df
Out[117]:
  passed_exam     exam  semester student  total_st
0        True   French         1       a         1
1       False  English         1       a         1
2        True  Italian         2       a         1
3       False  Chinese         2       b         2
4        True  Russian         1       b         1
5       False   German         1       b         1
6        True  Chinese         2       c         2
7       False  Spanish         2       c         1
8        True  English         2       c         1
9       False   French         2       c         1

Upvotes: 1

Dim'
Dim'

Reputation: 518

The way I understand your question, you want the 'total_st' column correspond to the number of students a student interracted with for a given exam. For instance, if the exam 'French' had 4 student ('a', 'b', 'c', 'd'), the student 'a' interacted with 3 students. Am I right?

If so, here is a solution. First, let's forget about semesters to simplify the problem, and let's consider the following example:

df = pd.DataFrame({
        'student': ['a'] * 3 + ['b'] * 3 + ['c'] * 4,
        'exam': [
            'Chinese', 'English', 'Spanish', 'Chinese', 'Spanish',
            'Spanish', 'Chinese', 'Spanish', 'English', 'Chinese'
        ],
        'passed_exam':[True, False] * 5
})
print(df)

          exam passed_exam student
    0  Chinese        True       a
    1  English       False       a
    2  Spanish        True       a
    3  Chinese       False       b
    4  Spanish        True       b
    5  Spanish       False       b
    6  Chinese        True       c
    7  Spanish       False       c
    8  English        True       c
    9  Chinese       False       c

Now, we can compute a Series that associate the number of students for each exam, using groupby:

d = df.groupby(['exam'])['student'].count()
print(d)

    exam
    Chinese    4
    English    2
    Spanish    4
    Name: student, dtype: int64

We get the number of students a student interacted with by removing 1 to each value:

d = d - 1

Finally, we create 'total_st' column using apply, and assign it to the initial dataframe:

total_st = df.apply(lambda x: d.loc[x['exam']], axis=1)
df = df.assign(total_st=total_st)
print(df)

          exam passed_exam student  total_st
    0  Chinese        True       a         3
    1  English       False       a         1
    2  Spanish        True       a         3
    3  Chinese       False       b         3
    4  Spanish        True       b         3
    5  Spanish       False       b         3
    6  Chinese        True       c         3
    7  Spanish       False       c         3
    8  English        True       c         1
    9  Chinese       False       c         3

Upvotes: 1

Sergey Bushmanov
Sergey Bushmanov

Reputation: 25249

I interpret "number of students that each student interacted with (through exams)" as # of students who were sitting the same exam.

Then, it seems like:

df1 = (df
       .groupby(["exam","semester"], as_index=False)["student"].agg("count")
       .rename(columns={"student":"total_st"}))
df.merge(df1).sort_values(["semester","student"])

    passed_exam exam    semester    student total_st
0   True    French  1   a   1
1   False   English 1   a   1
5   True    Russian 1   b   1
6   False   German  1   b   1
2   True    Italian 2   a   1
3   False   Chinese 2   b   2
4   True    Chinese 2   c   2
7   False   Spanish 2   c   1
8   True    English 2   c   1
9   False   French  2   c   1

Upvotes: 1

Related Questions