trenzalore888
trenzalore888

Reputation: 35

Creating Function for Pandas that takes arguements (df, columnname) and calculates null percentgage

I am learning Python's Pandas library using kaggle's titanic tutorial. I am trying to create a function which will calculate the nulls in a column.

My attempt below appears to print the entire dataframe, instead of null values in the specified column:

def null_percentage_calculator(df,nullcolumn):
    df_column_null = df[nullcolumn].isnull().sum()
    df_column_null_percentage = np.ceil((df_column_null /testtotal)*100)
    print("{} percent of {} {} are NaN values".format(df_column_null_percentage,df,nullcolumn))

null_percentage_calculator(train,"Age")

My previous (and very first stack overflow question) was a similar problem, and it was explained to me that the .index method in pandas is undesirable and I should try and use other methods like [ ] and .loc to explicitly refer to the column.

So I have tried this:

df_column_null=[df[nullcolumn]].isnull().sum()

I have also tried

df_column_null=df[nullcolumn]df[nullcolumn].isnull().sum()

I am struggling to understand this aspect of Pandas. My non function method works fine:

Train_Age_Nulls = train["Age"].isnull().sum()
Train_Age_Nulls_percentage = (Train_Age_Nulls/traintotal)*100
Train_Age_Nulls_percentage_rounded = np.ceil(Train_Age_Nulls_percentage)
print("{} percent of Train's Age are NaN values".format(Train_Age_Nulls_percentage_rounded))

Could anyone let me know where I am going wrong?

Upvotes: 0

Views: 54

Answers (1)

Stael
Stael

Reputation: 2689

def null_percentage_calculator(df,nullcolumn):
    df_column_null = df[nullcolumn].isnull().sum()
    df_column_null_percentage = np.ceil((df_column_null /testtotal)*100)
    # what is testtotal?
    print("{} percent of {} {} are NaN values".format(df_column_null_percentage,df,nullcolumn))

I would do this with:

def null_percentage_calculator(df,nullcolumn):
    nulls = df[nullcolumn].isnull().sum()
    pct = float(nulls) / len(df[nullcolumn]) # need float because of python division
    # if you must you can * 100
    print "{} percent of column {} are null".format(pct*100, nullcolumn)

beware of python integer division where 63/180 = 0

if you want a float out, you have to put a float in.

Upvotes: 1

Related Questions