Reputation: 867
I have done some searching to answer this question, but all I can not figure out how to do this:
I have a dataset which contains 185 rows and 30 columns. Not all of the rows have value. I want to look for the position of the last value on each column and take the index of that column. I am not sure how to do this operation as when I use the code below it gives me the length of the data frame not just that column :
len(data_exam['col'])
I would appreciate for any suggestion.
Also I want to make sure that if I want to read all the columns in a loop, does my following code would be a good choice or not! :
list=[]
for col in data:
function which find the length of column
Thanks.
Upvotes: 3
Views: 3070
Reputation: 109528
You can use the last_valid_index
to find the index location of the last valid value in a given column. There is also its twin, first_valid_index
.
# Set-up sample data.
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5, 3), columns=list('ABC'))
df.iloc[2:, 0] = np.nan
df.iloc[4:, 1] = np.nan
>>> df
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
2 NaN -0.151357 -0.103219
3 NaN 0.144044 1.454274
4 NaN NaN 0.443863
# Solution to find index of last valid values per column.
>>> df.apply(lambda series: series.last_valid_index())
A 1 # <== Index of last valid data in column A.
B 3 # <== Index of last valid data in column B.
C 4 # <== Index of last valid data in column C.
dtype: int64
Upvotes: 7
Reputation: 323226
df.reset_index().melt('index').dropna().groupby('variable')['index'].max()
Out[487]:
variable
A 3
B 0
C 4
Name: index, dtype: int64
Upvotes: 2
Reputation: 153460
IIUC, you want the value of the last non-nan in each column:
df[::-1].bfill().iloc[0]
Example:
df = pd.DataFrame({'A':[1,2,3,4,np.nan],'B':[1,np.nan,np.nan,np.nan,np.nan],'C':[1,2,3,4,5]})
A B C
0 1.0 1.0 1
1 2.0 NaN 2
2 3.0 NaN 3
3 4.0 NaN 4
4 NaN NaN 5
Output:
A 4.0
B 1.0
C 5.0
Name: 4, dtype: float64
Upvotes: 5