Reputation: 430
So I have a dataframe that I read from a csv file and assigned it to 'ski_data'.
I am just a little confused on the wording of the instruction. It says:
Count (using .sum()) the number of missing values (.isnull()) in each column of ski_data as well as the percentages (using .mean() instead of .sum()) and order them using sort_values.
Call
pd.concat
to present these in a single table (DataFrame) with the helpful column names 'count' and '%'
The only way I would go about doing this is by copying ski_data into a new df but the problem now is I have a new dataframe and after transposing it, the new dataframe has the name of each column as index and a number corresponding to the amount of null values it has. Also, I don't know how to compute the mean the way the question is asking because it assumes I can write it all into one line.
I assume it is largely a syntax issue although I couldn't find any helpful resource online. Was wondering if anyone can help.
Upvotes: 4
Views: 11380
Reputation: 13488
You could try this:
import pandas as pd
# Toy dataframe
ski_data = pd.DataFrame(
{
"A": [1, 1, 1],
"B": [2, 2, None],
"C": ["markers", "", "markers"],
"D": [None, 2, None],
"E": [4, "", 4],
}
)
counts = ski_data.isna().sum()
print(counts.sort_values())
# Outputs
A 0
C 0
E 0
B 1
D 2
percentages = round(ski_data.isna().mean() * 100, 1)
print(percentages.sort_values())
# Outputs
A 0.0
C 0.0
E 0.0
B 33.3
D 66.7
null_values = pd.concat([counts, percentages], axis=1, keys=["count", "%"])
print(null_values)
# Outputs
count %
A 0 0.0
B 1 33.3
C 0 0.0
D 2 66.7
E 0 0.0
Upvotes: 8