Elham
Elham

Reputation: 867

Finding the last position of each column in data frame by Pandas

I have done some searching to answer this question, but all I can not figure out how to do this:

I have a dataset which contains 185 rows and 30 columns. Not all of the rows have value. I want to look for the position of the last value on each column and take the index of that column. I am not sure how to do this operation as when I use the code below it gives me the length of the data frame not just that column :

len(data_exam['col'])

I would appreciate for any suggestion.

Also I want to make sure that if I want to read all the columns in a loop, does my following code would be a good choice or not! :

 list=[]
 for col in data:
    function which find the length of column

Thanks.

Upvotes: 3

Views: 3070

Answers (3)

Alexander
Alexander

Reputation: 109528

You can use the last_valid_index to find the index location of the last valid value in a given column. There is also its twin, first_valid_index.

# Set-up sample data.
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5, 3), columns=list('ABC'))

df.iloc[2:, 0] = np.nan
df.iloc[4:, 1] = np.nan
>>> df
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2       NaN -0.151357 -0.103219
3       NaN  0.144044  1.454274
4       NaN       NaN  0.443863

# Solution to find index of last valid values per column.
>>> df.apply(lambda series: series.last_valid_index())
A    1  # <== Index of last valid data in column A.
B    3  # <== Index of last valid data in column B.
C    4  # <== Index of last valid data in column C.
dtype: int64

Upvotes: 7

BENY
BENY

Reputation: 323226

df.reset_index().melt('index').dropna().groupby('variable')['index'].max()
Out[487]: 
variable
A    3
B    0
C    4
Name: index, dtype: int64

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153460

IIUC, you want the value of the last non-nan in each column:

df[::-1].bfill().iloc[0]

Example:

df = pd.DataFrame({'A':[1,2,3,4,np.nan],'B':[1,np.nan,np.nan,np.nan,np.nan],'C':[1,2,3,4,5]})

     A    B  C
0  1.0  1.0  1
1  2.0  NaN  2
2  3.0  NaN  3
3  4.0  NaN  4
4  NaN  NaN  5

Output:

A    4.0
B    1.0
C    5.0
Name: 4, dtype: float64

Upvotes: 5

Related Questions