Reputation: 49
Python rookie here
Have two columns; 'Project Number' (Project 1, Project 2, Project 3... etc) and 'Date Expected' (DD/MM/YYYY) and a lot of the 'Date Expected' values are NaN and I want to calculate the percentage of NaN for each Project Number, i.e. Project 1 has 29% NaN, Project 2 is 55%, Project 3 is 40%, etc.
I know for percentages of an individual column you can use the below code but I'm struggling to work out how to do the above?
df['Date Expected'].value_counts(normalize=True)*100
Thanks in advance for any assistance
Upvotes: 1
Views: 274
Reputation: 18306
Try:
nan_pers = df.groupby("Project Number")["Date Expected"]\
.value_counts(normalize=True, dropna=False)\
.reset_index(level="Date Expected", name="NaN percentage") * 100
nan_pers = nan_pers[nan_pers["Date Expected"].isna()].drop(columns="Date Expected")
Passing dropna=False
to value_counts
indicates that it should report NaN
's count too. Then we have a multi index dataframe and so we reset_index
on the "Date Expected"
level and rename its values (which are normalized counts) to be NaN percentage
and it becomes a column on its own. In the second line, we drop the non-NaN
values along with the column that has the unique Date Expected
values, so we are left with a structure like
NaN percentage
Project Number
Project 1 50.0
Project 2 20.0
Project 3 40.0
Upvotes: 1