How to count null values for each columns as well as finding percentage in pandas dataframe?

Question

So I have a dataframe that I read from a csv file and assigned it to 'ski_data'.

I am just a little confused on the wording of the instruction. It says:

Count (using .sum()) the number of missing values (.isnull()) in each column of ski_data as well as the percentages (using .mean() instead of .sum()) and order them using sort_values.

Call pd.concat to present these in a single table (DataFrame) with the helpful column names 'count' and '%'

The only way I would go about doing this is by copying ski_data into a new df but the problem now is I have a new dataframe and after transposing it, the new dataframe has the name of each column as index and a number corresponding to the amount of null values it has. Also, I don't know how to compute the mean the way the question is asking because it assumes I can write it all into one line.

I assume it is largely a syntax issue although I couldn't find any helpful resource online. Was wondering if anyone can help.

Laurent · Accepted Answer

You could try this:

import pandas as pd

# Toy dataframe
ski_data = pd.DataFrame(
    {
        "A": [1, 1, 1],
        "B": [2, 2, None],
        "C": ["markers", "", "markers"],
        "D": [None, 2, None],
        "E": [4, "", 4],
    }
)

counts = ski_data.isna().sum()
print(counts.sort_values())
# Outputs
A    0      
C    0      
E    0      
B    1      
D    2

percentages = round(ski_data.isna().mean() * 100, 1)
print(percentages.sort_values())
# Outputs
A     0.0     
C     0.0     
E     0.0     
B    33.3     
D    66.7

null_values = pd.concat([counts, percentages], axis=1, keys=["count", "%"])
print(null_values)
# Outputs
   count     %
A      0   0.0
B      1  33.3
C      0   0.0
D      2  66.7
E      0   0.0

How to count null values for each columns as well as finding percentage in pandas dataframe?

Answers (1)

Related Questions