user12550148
user12550148

Reputation: 53

Python Pandas dataframe find missing values

I'm trying to find missing values and then drop off missing values. Tried looking for the data online but can't seem to find the answer.

Extracted Dataframe:

Extracted Dataframe

In the df, for 1981 and 1982, it should be '-', i.e. missing values. I would like to find the missing values then drop off the missing values.

Exported Dataframe using isnull: enter image description here

I used df.isnull() but in 1981 and 1982, it's detected as 'False' which means there's data. But it should be '-', therefore considered as missing values.

I had pasted my code below. What am I missing out?

import pandas as pd

mydf = pd.read_excel('abc.xlsx', sep='\t')

df1 = mydf.set_index('Variables')
df = df1[0:10]
print(df)
print(df.isnull())

Upvotes: 4

Views: 6985

Answers (3)

Dinesh vishe
Dinesh vishe

Reputation: 3618

features_with_na=[features for features in df.columns if df[features].isnull().sum()>=1] for feature in features_with_na: print(feature,np.round(df[feature].isnull().mean()*100,5), '% missing values')

Upvotes: 0

Gonçalo Peres
Gonçalo Peres

Reputation: 13622

The question has two points: finding which columns have missing values and drop those values.

To find the missing values on a dataframe df

missing = df.isnull().sum()
print(missing)

To drop those missing values, apart from @jezrael's consideration, if that doesn't help, I suggest you to use dropna:

Drop the rows where all elements are missing.

df.dropna(how='all')

Drop the columns where at least one element is missing.

df.dropna(axis='columns')

Upvotes: 7

jezrael
jezrael

Reputation: 863651

Missing values are not -.

So for missing values use na_values parameter in read_excel for converting - to missing values NaNs:

mydf = pd.read_excel('abc.xlsx', sep='\t', na_values='-')

na_values : scalar, str, list-like, or dict, default None

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'.

Upvotes: 2

Related Questions