MLP99
MLP99

Reputation: 63

How to delete rows in Pandas after a certain value?

I replicated a Pandas series with the following code:

data = np.array([1, 2, 3, 4, 5, np.nan, np.nan, np.nan, 9,10,11,12,13,14])
  
ser = pd.Series(data)
print(ser)

I would like to select only the columns before the NaN values so that I only get the values 1,2,3,4,5. How should I do that?

Upvotes: 0

Views: 120

Answers (2)

jezrael
jezrael

Reputation: 862591

Test missing values with Series.isna and add Series.cummax for repeat Trues after first match and last invert mask by ~, filter in boolean indexing:

a = ser[~ser.isna().cummax()]
print(a)
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

Alternative solution with cumulative sum:

a = ser[ser.isna().cumsum().eq(0)]
print(a)
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

Details:

print(ser.to_frame().assign(testna = ser.isna(), 
                            cummax = ser.isna().cumsum(),
                            invert = ser.isna().cumsum().eq(0)))
       0  testna  cummax  invert
0    1.0   False       0    True
1    2.0   False       0    True
2    3.0   False       0    True
3    4.0   False       0    True
4    5.0   False       0    True
5    NaN    True       1   False
6    NaN    True       2   False
7    NaN    True       3   False
8    9.0   False       3   False
9   10.0   False       3   False
10  11.0   False       3   False
11  12.0   False       3   False
12  13.0   False       3   False
13  14.0   False       3   False

print(ser.to_frame().assign(testna = ser.isna(), 
                            cummax = ser.isna().cummax(),
                            test0 = ~ser.isna().cummax()))

       0  testna  cummax   test0
0    1.0   False   False    True
1    2.0   False   False    True
2    3.0   False   False    True
3    4.0   False   False    True
4    5.0   False   False    True
5    NaN    True    True   False
6    NaN    True    True   False
7    NaN    True    True   False
8    9.0   False    True   False
9   10.0   False    True   False
10  11.0   False    True   False
11  12.0   False    True   False
12  13.0   False    True   False
13  14.0   False    True   False

Upvotes: 1

mozway
mozway

Reputation: 260490

Use a boolean mask to slice the series.

You have two options, check is the value is not NA with notna and extend the False values after the first True with Series.cummin.

ser[ser.notna().cummin()]

output:

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

Or, test if the values are NA with isna, then extend the True values after the first True with Series.cummax, then invert the mask with ~:

ser[~ser.isna().cummax()]
visual representation of how it works:
    data  notna  notna+cummin   isna  isna+cummax  ~(isna+cummax)
0    1.0   True          True  False        False            True
1    2.0   True          True  False        False            True
2    3.0   True          True  False        False            True
3    4.0   True          True  False        False            True
4    5.0   True          True  False        False            True
5    NaN  False         False   True         True           False
6    NaN  False         False   True         True           False
7    NaN  False         False   True         True           False
8    9.0   True         False  False         True           False
9   10.0   True         False  False         True           False
10  11.0   True         False  False         True           False
11  12.0   True         False  False         True           False
12  13.0   True         False  False         True           False
13  14.0   True         False  False         True           False

Upvotes: 1

Related Questions