NICode
NICode

Reputation: 49

Percentage of NaN values for corresponding Values in Column

Python rookie here

Have two columns; 'Project Number' (Project 1, Project 2, Project 3... etc) and 'Date Expected' (DD/MM/YYYY) and a lot of the 'Date Expected' values are NaN and I want to calculate the percentage of NaN for each Project Number, i.e. Project 1 has 29% NaN, Project 2 is 55%, Project 3 is 40%, etc.

I know for percentages of an individual column you can use the below code but I'm struggling to work out how to do the above?

df['Date Expected'].value_counts(normalize=True)*100

Thanks in advance for any assistance

Upvotes: 1

Views: 274

Answers (1)

Mustafa Aydın
Mustafa Aydın

Reputation: 18306

Try:

nan_pers = df.groupby("Project Number")["Date Expected"]\
             .value_counts(normalize=True, dropna=False)\
             .reset_index(level="Date Expected", name="NaN percentage") * 100

nan_pers = nan_pers[nan_pers["Date Expected"].isna()].drop(columns="Date Expected")

Passing dropna=False to value_counts indicates that it should report NaN's count too. Then we have a multi index dataframe and so we reset_index on the "Date Expected" level and rename its values (which are normalized counts) to be NaN percentage and it becomes a column on its own. In the second line, we drop the non-NaN values along with the column that has the unique Date Expected values, so we are left with a structure like

                NaN percentage
Project Number
Project 1       50.0
Project 2       20.0
Project 3       40.0

Upvotes: 1

Related Questions