Reputation: 53
I'm trying to find missing values and then drop off missing values. Tried looking for the data online but can't seem to find the answer.
Extracted Dataframe:
In the df, for 1981 and 1982, it should be '-', i.e. missing values. I would like to find the missing values then drop off the missing values.
Exported Dataframe using isnull:
I used df.isnull() but in 1981 and 1982, it's detected as 'False' which means there's data. But it should be '-', therefore considered as missing values.
I had pasted my code below. What am I missing out?
import pandas as pd
mydf = pd.read_excel('abc.xlsx', sep='\t')
df1 = mydf.set_index('Variables')
df = df1[0:10]
print(df)
print(df.isnull())
Upvotes: 4
Views: 6985
Reputation: 3618
features_with_na=[features for features in df.columns if df[features].isnull().sum()>=1] for feature in features_with_na: print(feature,np.round(df[feature].isnull().mean()*100,5), '% missing values')
Upvotes: 0
Reputation: 13622
The question has two points: finding which columns have missing values and drop those values.
To find the missing values on a dataframe df
missing = df.isnull().sum()
print(missing)
To drop those missing values, apart from @jezrael's consideration, if that doesn't help, I suggest you to use dropna
:
Drop the rows where all elements are missing.
df.dropna(how='all')
Drop the columns where at least one element is missing.
df.dropna(axis='columns')
Upvotes: 7
Reputation: 863651
Missing values are not -
.
So for missing values use na_values
parameter in read_excel
for converting -
to missing values NaN
s:
mydf = pd.read_excel('abc.xlsx', sep='\t', na_values='-')
na_values : scalar, str, list-like, or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'.
Upvotes: 2