How can I remove the special lines in data frame of pandas in an easy way

Question

I have a dataframe of pandas in python. I want to remove the line in three conditions.First, column 1 to 6 and 10 to 15 are 'NA' in the line. Second, column 1 to 3 and 7 to 12 and 16 to 18 are 'NA'. Third, colum 4 to 9 and 13 to 18 are 'NA'. I wrote the code to fix it, but it didn't work. The code is as follows:

data = pd.read_csv('data(2).txt',sep = "	",index_col = 'tracking_id')
num = len(data) + 1
for i in range(num):
    if (data.iloc[i,[0:5,9:14]] == 'NA') | (data.iloc[i,[0:11,15:17]] == 'NA)'\
    | (data.iloc[i,[3:8,12:17]] == 'NA'):
        data = data.drop(data.index[i], axis = 0)

The data is in the link: enter link description here

jezrael · Accepted Answer

You can use:

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,18)))

df.iloc[0, np.r_[0:5,9:14]] = np.nan
df.iloc[2, np.r_[0:11,15:17]] = np.nan
df.iloc[3:5, np.r_[3:8,12:17]] = np.nan
print (df)
    0    1    2    3    4    5    6    7    8    9    10   11   12   13   14  \
0  NaN  NaN  NaN  NaN  NaN  0.0  4.0  2.0  5.0  NaN  NaN  NaN  NaN  NaN  8.0   
1  6.0  2.0  4.0  1.0  5.0  3.0  4.0  4.0  3.0  7.0  1.0  1.0  7.0  7.0  0.0   
2  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  2.0  5.0  1.0  8.0   
3  2.0  8.0  3.0  NaN  NaN  NaN  NaN  NaN  3.0  4.0  7.0  6.0  NaN  NaN  NaN   
4  7.0  6.0  6.0  NaN  NaN  NaN  NaN  NaN  6.0  6.0  0.0  7.0  NaN  NaN  NaN   

    15   16  17  
0  4.0  0.0   9  
1  2.0  9.0   9  
2  NaN  NaN   4  
3  NaN  NaN   5  
4  NaN  NaN   4

First check if values are NaN by isnull, then select by numpy.r_ and iloc and compare with all for check if all valueas are True per row. Then build main mask with | (or).

Last filter by boolean indexing with inverted condition by ~:

mask = df.isnull()
m1 = mask.iloc[:, np.r_[0:5,9:14]].all(1)
m2 = mask.iloc[:, np.r_[0:11,15:17]].all(1)
m3 = mask.iloc[:, np.r_[3:8,12:17]].all(1)
m = m1 | m2 | m3
print (m)
0     True
1    False
2     True
3     True
4     True
dtype: bool

df = df[~m]
print (df)
    0    1    2    3    4    5    6    7    8    9    10   11   12   13   14  \
1  6.0  2.0  4.0  1.0  5.0  3.0  4.0  4.0  3.0  7.0  1.0  1.0  7.0  7.0  0.0   

    15   16  17  
1  2.0  9.0   9

How can I remove the special lines in data frame of pandas in an easy way

Answers (2)

Related Questions