puhles
puhles

Reputation: 21

Python adding missing rows based on a single column

I have a dataframe with two colums of interest: a column with names (repeated for each course), and a column with the course name they've taken or started. The problem is, if I use the dataframe as-is, if a person has not attempted the course, they won't have a value showing for that course at all, and therefore I won't be able to tell who is missing the course. Is there a way to auto generate the missing rows with a name? The other columns can be N/A or anything else.

Dataframe:

Name Course Title Completion Date
Mark Math 1/1/2021
Mark English 2/1/2021
Mark Econ 3/1/2021
David Math 7/1/2021
David English 4/1/2021
Cindy Math 6/1/2021
Cindy English 9/1/2021
Cindy Econ 11/1/2021

What I'm looking for:

Name Course Title Completion Date
Mark Math 1/1/2021
Mark English 2/1/2021
Mark Econ 3/1/2021
David Math 7/1/2021
David English 4/1/2021
David Econ N/A
Cindy Math 6/1/2021
Cindy English 9/1/2021
Cindy Econ 11/1/2021

Upvotes: 1

Views: 69

Answers (2)

sammywemmy
sammywemmy

Reputation: 28644

You can make missing values explicit with the complete function from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor
df.complete('Name', 'Course Title')

    Name Course Title Completion Date
0  Cindy         Econ       11/1/2021
1  Cindy      English        9/1/2021
2  Cindy         Math        6/1/2021
3  David         Econ             NaN
4  David      English        4/1/2021
5  David         Math        7/1/2021
6   Mark         Econ        3/1/2021
7   Mark      English        2/1/2021
8   Mark         Math        1/1/2021

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71570

Try with pivot and stack with reset_index:

>>> df.pivot(*df).stack(dropna=False)[::-1].reset_index(name='Completion Date')
    Name Course Title Completion Date
0   Mark         Math        1/1/2021
1   Mark      English        2/1/2021
2   Mark         Econ        3/1/2021
3  David         Math        7/1/2021
4  David      English        4/1/2021
5  David         Econ             NaN
6  Cindy         Math        6/1/2021
7  Cindy      English        9/1/2021
8  Cindy         Econ       11/1/2021
>>> 

If you specifically want it to be N/A, try:

>>> df.pivot(*df).stack(dropna=False)[::-1].reset_index(name='Completion Date').fillna('N/A')
    Name Course Title Completion Date
0   Mark         Math        1/1/2021
1   Mark      English        2/1/2021
2   Mark         Econ        3/1/2021
3  David         Math        7/1/2021
4  David      English        4/1/2021
5  David         Econ             N/A
6  Cindy         Math        6/1/2021
7  Cindy      English        9/1/2021
8  Cindy         Econ       11/1/2021
>>> 

Upvotes: 1

Related Questions