bigbug
bigbug

Reputation: 59504

How to get the last N rows of a pandas DataFrame?

I have pandas dataframe df1:

    STK_ID  RPT_Date  TClose   sales  discount
0   000568  20060331    3.69   5.975       NaN
1   000568  20060630    9.14  10.143       NaN
2   000568  20060930    9.49  13.854       NaN
3   000568  20061231   15.84  19.262       NaN
4   000568  20070331   17.00   6.803       NaN
5   000568  20070630   26.31  12.940       NaN
6   000568  20070930   39.12  19.977       NaN
7   000568  20071231   45.94  29.269       NaN
8   000568  20080331   38.75  12.668       NaN
9   000568  20080630   30.09  21.102       NaN
10  000568  20080930   26.00  30.769       NaN

I wanted to select the last 3 rows and tried df1.ix[-3:], but it returns all the rows. Why? How to get the last 3 rows of df1? I'm using pandas 0.10.1.

Upvotes: 250

Views: 479727

Answers (4)

Wes McKinney
Wes McKinney

Reputation: 105521

Don't forget DataFrame.tail! e.g. df1.tail(10)

Upvotes: 567

cottontail
cottontail

Reputation: 23111

The top two answers suggest that there may be 2 ways to get the same output but if you look at the source code, .tail(n) is a syntactic sugar for .iloc[-n:]. For the task of getting the last n rows as in the title, they are exactly the same.

However, they are different if we want to get the last n rows of a group because there is no groupby.iloc but there is groupby.tail (and groupby.nth).


Another, slightly obscure, way to get the last rows is via take() which is similar to numpy.take; however, we have to pass a list of indices: df.take(range(-n, 0)).

The main difference of this from tail/iloc is related to error handling. If the dataframe has less than n rows, but we try to get the last n rows, tail/iloc returns the entire dataframe while take raises an error. This comes in handy if you're making calls to an API or webscraping etc. where you expect the dataframe to have a certain shape but something fails unexpectedly; tail may silently produce a wrong output while take can alert you.

df = pd.DataFrame({'a': [1, 2]})
df.tail(5)             # <--- entire dataframe
df.iloc[-5:]           # <--- entire dataframe
df.take(range(-5,0))   # <--- IndexError: indices are out-of-bounds

Upvotes: 0

cs95
cs95

Reputation: 402483

How to get the last N rows of a pandas DataFrame?

If you are slicing by position, __getitem__ (i.e., slicing with[]) works well, and is the most succinct solution I've found for this problem.

pd.__version__
# '0.24.2'

df = pd.DataFrame({'A': list('aaabbbbc'), 'B': np.arange(1, 9)})
df

   A  B
0  a  1
1  a  2
2  a  3
3  b  4
4  b  5
5  b  6
6  b  7
7  c  8

df[-3:]

   A  B
5  b  6
6  b  7
7  c  8

This is the same as calling df.iloc[-3:], for instance (iloc internally delegates to __getitem__).


As an aside, if you want to find the last N rows for each group, use groupby and GroupBy.tail:

df.groupby('A').tail(2)

   A  B
1  a  2
2  a  3
5  b  6
6  b  7
7  c  8

Upvotes: 14

Andy Hayden
Andy Hayden

Reputation: 375475

This is because of using integer indices (ix selects those by label over -3 rather than position, and this is by design: see integer indexing in pandas "gotchas"*).

*In newer versions of pandas prefer loc or iloc to remove the ambiguity of ix as position or label:

df.iloc[-3:]

see the docs.

As Wes points out, in this specific case you should just use tail!

Upvotes: 115

Related Questions