Reputation: 445
I had a big dataset which I munged and cleansed to finally look like shown in the image. The clean dataset is given below.
RegistrationNo StudentName Date ClassAttend
16SCSE101002 Eric Brown 12-03-2019 1
16SCSE101002 Eric Brown 08-03-2019 1
16SCSE101002 Eric Brown 12-03-2019 1
16SCSE101002 Eric Brown 15-03-2019 1
16SCSE101002 Eric Brown 08-03-2019 1
16SCSE101002 Eric Brown 22-03-2019 0
16SCSE101002 Eric Brown 22-03-2019 0
16SCSE101002 Eric Brown 26-03-2019 1
16SCSE101005 Derek Oldman 15-02-2019 1
16SCSE101005 Derek Oldman 12-02-2019 0
16SCSE101005 Derek Oldman 15-02-2019 1
16SCSE101005 Derek Oldman 29-03-2019 1
16SCSE101005 Derek Oldman 12-02-2019 0
16SCSE101005 Derek Oldman 29-03-2019 1
16SCSE101005 Derek Oldman 02-04-2019 1
16SCSE101005 Derek Oldman 02-04-2019 1
16SCSE101005 Derek Oldman 12-03-2019 1
In the data, information is given of Students with their respective Registration numbers, Names, Date of Classes and number of classes attended on that day.
I want to display each data in the column 'Date' as an individual Column.
The idea is to display the number of Classes attended by each Student on each Date. The student is identified by thier unique Registration numbers.
I want to view my dataframe with headers like below:
RegistrationNo | StudentName | Date1 | Date2 | Date3. ....... | DateN
followed by their individual datas provided under each of the above columns.
For Example: Sample Input:
16SCSE101002
12/3/2019
16SCSE101005
15/3/2019
Sample Output:
16SCSE101002
2
16SCSE101005
0
The user enters the Registration No. and they should be able to see the attendance of that student on each day of the classes conducted.
How can I go ahead with something like this?
Upvotes: 1
Views: 321
Reputation: 2889
It is unclear to me why you want to convert the dataframe to have date1, date2 etc. in the header. If the purpose is simply to get the registered students attendence on a given day I suggest two other options (there are probably hundreds of ways of doing this).
from datetime import datetime
import pandas as pd
# Small subset of your data
df = pd.DataFrame({'RegistrationNo': ['16SCSE101002']*7,
'StudentName': ['DEREK TIMBER'] * 7,
'Date': ['12/03/2019', '08/03/2019', '12/03/2019', '15/03/2019',
'08/03/2019','22/03/2019', '22/03/2019'],
'ClassAttend': [1,1,1,1,1,0,0]})
# Convert the 'string-dates' to pandas.datetime
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
Option 1, to get number of classes attended for a given registration number on a given date, use a simple filter and the sum
-function
reg_no = '16SCSE101002'
reg_date = datetime(2019, 3, 12)
# This gives you what you want
df[(df['RegistrationNo'] == reg_no) & (df['Date']==reg_date)]['ClassAttend'].sum()
Or if you want the whole list with all the sums for each day do a groupby
-sum
df_summed = df.groupby(['RegistrationNo', 'StudentName', 'Date'])['ClassAttend'].sum().reset_index()
RegistrationNo StudentName Date ClassAttend
0 16SCSE101002 DEREK TIMBER 2019-03-08 2
1 16SCSE101002 DEREK TIMBER 2019-03-12 2
2 16SCSE101002 DEREK TIMBER 2019-03-15 1
3 16SCSE101002 DEREK TIMBER 2019-03-22 0
And then you can use the previusly mentioned filter without the sum.
If you have very strong reasons for doing it the way you suggested please let me know (so I at least can delete this answer :)).
Upvotes: 1
Reputation: 2581
You can play around a bit with the pivot
function.
For example, you can try:
df.pivot(index='RegistrationNo', columns='Date', values='ClassAttend')
More examples here.
You should probably aggregate the number of classes attended by each student first (depends really on the exact thing you want to achieve - it's not really clear from your question and the screenshot of the data).
Upvotes: 0