Reputation: 4743
Let's say I have following DataFrame
:
import pandas as pd
import numpy as np
df = pd.DataFrame({"A":[11,21,31], "B":[12,22,32], "C":[np.nan,23,33], "D":[np.nan,24,34], "E":[15,25,35]})
Which would return:
>>> df
A B C D E
0 11 12 NaN NaN 15
1 21 22 23.0 24.0 25
2 31 32 33.0 34.0 35
nan
valuesI know how to remove all the columns which have any row with a nan
value like this:
out1 = df.dropna(axis=1, how="any")
Which returns:
>>> out1
A B E
0 11 12 15
1 21 22 25
2 31 32 35
However what I expect is to remove all columns after a nan
value is found. In the toy example code the expected output would be:
A B
0 11 12
1 21 22
2 31 32
How can I remove all columns after a nan
is found within any row in a pandas
DataFrame
?
Upvotes: 3
Views: 1267
Reputation: 68146
What I would do:
any
for every column, across the rowsdf.loc[:, ~df.isna().cumsum(axis=1).any(axis=0)]
Give me:
A B
0 11 12
1 21 22
2 31 32
Upvotes: 4
Reputation: 4743
I could find a way as follows to get the expected output:
colFirstNaN = df.isna().any(axis=0).idxmax() # Find column that has first NaN element in any row
indexColLastValue = df.columns.tolist().index(colFirstNaN) -1
ColLastValue = df.columns[indexColLastValue]
out2 = df.loc[:, :ColLastValue]
And the output would be then:
>>> out2
A B
0 11 12
1 21 22
2 31 32
Upvotes: 0