Cedric Zoppolo
Cedric Zoppolo

Reputation: 4743

How to remove columns after any row has a NaN value in Python pandas dataframe

Toy example code

Let's say I have following DataFrame:

import pandas as pd
import numpy as np
df = pd.DataFrame({"A":[11,21,31], "B":[12,22,32], "C":[np.nan,23,33], "D":[np.nan,24,34], "E":[15,25,35]})

Which would return:

>>> df
    A   B     C     D   E
0  11  12   NaN   NaN  15
1  21  22  23.0  24.0  25
2  31  32  33.0  34.0  35

Remove all columns with nan values

I know how to remove all the columns which have any row with a nan value like this:

out1 = df.dropna(axis=1, how="any")

Which returns:

>>> out1
    A   B   E
0  11  12  15
1  21  22  25
2  31  32  35

Expected output

However what I expect is to remove all columns after a nan value is found. In the toy example code the expected output would be:

    A   B
0  11  12
1  21  22
2  31  32

Question

How can I remove all columns after a nan is found within any row in a pandas DataFrame ?

Upvotes: 3

Views: 1267

Answers (2)

Paul H
Paul H

Reputation: 68146

What I would do:

  1. check every element for being null/not null
  2. cumulative sum every row across the columns
  3. check any for every column, across the rows
  4. use that result as an indexer:
df.loc[:, ~df.isna().cumsum(axis=1).any(axis=0)]

Give me:

    A   B
0  11  12
1  21  22
2  31  32

Upvotes: 4

Cedric Zoppolo
Cedric Zoppolo

Reputation: 4743

I could find a way as follows to get the expected output:

colFirstNaN = df.isna().any(axis=0).idxmax() # Find column that has first NaN element in any row
indexColLastValue = df.columns.tolist().index(colFirstNaN) -1
ColLastValue = df.columns[indexColLastValue]
out2 = df.loc[:, :ColLastValue]

And the output would be then:

>>> out2
    A   B
0  11  12
1  21  22
2  31  32

Upvotes: 0

Related Questions