Reputation: 25366
I am using the following code to remove some rows with missing data in pandas:
df = df.replace(r'^\s+$', np.nan, regex=True)
df = df.replace(r'^\t+$', np.nan, regex=True)
df = df.dropna()
However, I still have some cells in the data frame looks blank/empty. Why is this happening? Any way to get rid of rows with such empty/blank cells? Thanks!
Upvotes: 3
Views: 10510
Reputation: 1731
I'm providing code with input and output data:
Input:
Original DataFrame:
Name Age City
0 Alice 25.0 New York
1 Bob NaN Los Angeles
2 NaN 30.0 New York
3 Diana 22.0 NaN
4 Ethan NaN Chicago
Code:
import pandas as pd
import numpy as np
data = {
'Name': ['Alice', 'Bob', np.nan, 'Diana', 'Ethan'],
'Age': [25, np.nan, 30, 22, np.nan],
'City': ['New York', 'Los Angeles', 'New York', np.nan, 'Chicago']
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
"""
Here Im dropping null value of specific column
"""
df_cleaned = df.dropna(subset=['Name', 'Age', 'City'])
print("DataFrame after removing rows with missing data:")
print(df_cleaned)
Output:
DataFrame after removing rows with missing data:
Name Age City
0 Alice 25.0 New York
Upvotes: 0
Reputation: 2005
Depending on your version of pandas you may do:
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False) axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Determine if rows or columns which contain missing values are removed.
0, or ‘index’ : Drop rows which contain missing values. 1, or ‘columns’ : Drop columns which contain missing value.
Deprecated since version 0.23.0:: Pass tuple or list to drop on multiple
axes. source
So, for now to drop rows with empty values
df = df.dropna(axis=0)
Should work
Upvotes: 2
Reputation: 862911
You can use:
df = df.replace('', np.nan)
If want simplify your code is possible join regexes by |
and for empty space use ^$
:
df = pd.DataFrame({'A':list('abcdef'),
'B':['',5,4,5,5,4],
'C':['',' ',' ',4,2,3],
'D':[1,3,5,7,' ',0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
df = df.replace(r'^\s+$|^\t+$|^$', np.nan, regex=True)
print (df)
A B C D E F
0 a NaN NaN 1.0 5 a
1 b 5.0 NaN 3.0 3 a
2 c 4.0 NaN 5.0 6 a
3 d 5.0 4.0 7.0 9 b
4 e 5.0 2.0 NaN 2 b
5 f 4.0 3.0 0.0 4 b
Upvotes: 4