Using a Dataframe to Display Column Data as an Individual Column

Question

I had a big dataset which I munged and cleansed to finally look like shown in the image. The clean dataset is given below.

RegistrationNo  StudentName   Date        ClassAttend
16SCSE101002    Eric Brown    12-03-2019    1
16SCSE101002    Eric Brown    08-03-2019    1
16SCSE101002    Eric Brown    12-03-2019    1
16SCSE101002    Eric Brown    15-03-2019    1
16SCSE101002    Eric Brown    08-03-2019    1
16SCSE101002    Eric Brown    22-03-2019    0
16SCSE101002    Eric Brown    22-03-2019    0
16SCSE101002    Eric Brown    26-03-2019    1 
16SCSE101005    Derek Oldman    15-02-2019  1
16SCSE101005    Derek Oldman    12-02-2019  0
16SCSE101005    Derek Oldman    15-02-2019  1
16SCSE101005    Derek Oldman    29-03-2019  1
16SCSE101005    Derek Oldman    12-02-2019  0
16SCSE101005    Derek Oldman    29-03-2019  1
16SCSE101005    Derek Oldman    02-04-2019  1
16SCSE101005    Derek Oldman    02-04-2019  1
16SCSE101005    Derek Oldman    12-03-2019  1

In the data, information is given of Students with their respective Registration numbers, Names, Date of Classes and number of classes attended on that day.

I want to display each data in the column 'Date' as an individual Column.

The idea is to display the number of Classes attended by each Student on each Date. The student is identified by thier unique Registration numbers.

I want to view my dataframe with headers like below:

RegistrationNo | StudentName | Date1 | Date2 | Date3. .......  | DateN

followed by their individual datas provided under each of the above columns.

For Example: Sample Input:

16SCSE101002
12/3/2019

16SCSE101005
15/3/2019

Sample Output:

16SCSE101002
2

16SCSE101005
0

The user enters the Registration No. and they should be able to see the attendance of that student on each day of the classes conducted.

How can I go ahead with something like this?

mortysporty · Accepted Answer

It is unclear to me why you want to convert the dataframe to have date1, date2 etc. in the header. If the purpose is simply to get the registered students attendence on a given day I suggest two other options (there are probably hundreds of ways of doing this).

from datetime import datetime
import pandas as pd
# Small subset of your data
df = pd.DataFrame({'RegistrationNo': ['16SCSE101002']*7,
                   'StudentName': ['DEREK TIMBER'] * 7,
                   'Date': ['12/03/2019', '08/03/2019', '12/03/2019', '15/03/2019', 
'08/03/2019','22/03/2019', '22/03/2019'],
                   'ClassAttend': [1,1,1,1,1,0,0]})
# Convert the 'string-dates' to pandas.datetime
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

Option 1, to get number of classes attended for a given registration number on a given date, use a simple filter and the sum-function

reg_no = '16SCSE101002'
reg_date = datetime(2019, 3, 12)

# This gives you what you want
df[(df['RegistrationNo'] == reg_no) & (df['Date']==reg_date)]['ClassAttend'].sum()

Or if you want the whole list with all the sums for each day do a groupby-sum

df_summed = df.groupby(['RegistrationNo', 'StudentName', 'Date'])['ClassAttend'].sum().reset_index()

  RegistrationNo   StudentName       Date  ClassAttend
0   16SCSE101002  DEREK TIMBER 2019-03-08            2
1   16SCSE101002  DEREK TIMBER 2019-03-12            2
2   16SCSE101002  DEREK TIMBER 2019-03-15            1
3   16SCSE101002  DEREK TIMBER 2019-03-22            0

And then you can use the previusly mentioned filter without the sum.

If you have very strong reasons for doing it the way you suggested please let me know (so I at least can delete this answer :)).

Using a Dataframe to Display Column Data as an Individual Column

Answers (2)

Related Questions