frankgut
frankgut

Reputation: 125

Pandas select the second to last column which is also not nan

I've cleaned my data as much as I can and read them in Pandas dataframe. So the problem is that different files have different number of columns, but it always the second to the last non-nan column is what I want. So is there anyway to pick them out? Here is an example of the data.

     ...    f       g      h      l
0    ...    39994  29.568  29.569 NaN  
1    ...    39994  29.568  29.569 NaN  
2    ...    39994  29.568  29.569 NaN 

so I want the column g in this case. So in other files, it could be f or anything depends on the number of nan columns in the end. But it's always the second to the last non-nan column is what I need. Thanks for the help ahead.

Upvotes: 3

Views: 9597

Answers (2)

root
root

Reputation: 33793

Similar idea to @piRSquared. Essentially, use loc to keep the non-null columns, then use iloc to select the second to last.

df.loc[:, ~df.isnull().all()].iloc[:, -2]

Sample input:

   a  b  c   d   e   f   g   h   i   j
0  0  3  6   9  12  15  18  21 NaN NaN
1  1  4  7  10  13  16  19  22 NaN NaN
2  2  5  8  11  14  17  20  23 NaN NaN

Sample output:

0    18
1    19
2    20
Name: g, dtype: int32

Upvotes: 3

piRSquared
piRSquared

Reputation: 294328

One liner

df.loc[:, :df.columns[(df.columns == df.isnull().all().idxmax()).argmax() - 2]]

   ...      f       g
0  ...  39994  29.568
1  ...  39994  29.568
2  ...  39994  29.568

Readable

# identify null columns
nullcols = df.isnull().all()

# find the column heading for the first null column
nullcol = nullcols.idxmax()

# where is null column at
nullcol_position = (df.columns == nullcol).argmax()

# get column 2 positions prior
col_2_prior_to_null_col = df.columns[nullcol_position - 2]

# get dataframe
print df.loc[:, :col_2_prior_to_null_col]

Upvotes: 2

Related Questions