Reputation: 125
I've cleaned my data as much as I can and read them in Pandas dataframe. So the problem is that different files have different number of columns, but it always the second to the last non-nan column is what I want. So is there anyway to pick them out? Here is an example of the data.
... f g h l
0 ... 39994 29.568 29.569 NaN
1 ... 39994 29.568 29.569 NaN
2 ... 39994 29.568 29.569 NaN
so I want the column g in this case. So in other files, it could be f or anything depends on the number of nan columns in the end. But it's always the second to the last non-nan column is what I need. Thanks for the help ahead.
Upvotes: 3
Views: 9597
Reputation: 33793
Similar idea to @piRSquared. Essentially, use loc
to keep the non-null columns, then use iloc
to select the second to last.
df.loc[:, ~df.isnull().all()].iloc[:, -2]
Sample input:
a b c d e f g h i j
0 0 3 6 9 12 15 18 21 NaN NaN
1 1 4 7 10 13 16 19 22 NaN NaN
2 2 5 8 11 14 17 20 23 NaN NaN
Sample output:
0 18
1 19
2 20
Name: g, dtype: int32
Upvotes: 3
Reputation: 294328
df.loc[:, :df.columns[(df.columns == df.isnull().all().idxmax()).argmax() - 2]]
... f g
0 ... 39994 29.568
1 ... 39994 29.568
2 ... 39994 29.568
# identify null columns
nullcols = df.isnull().all()
# find the column heading for the first null column
nullcol = nullcols.idxmax()
# where is null column at
nullcol_position = (df.columns == nullcol).argmax()
# get column 2 positions prior
col_2_prior_to_null_col = df.columns[nullcol_position - 2]
# get dataframe
print df.loc[:, :col_2_prior_to_null_col]
Upvotes: 2