Reputation: 59504
I have pandas dataframe df1
:
STK_ID RPT_Date TClose sales discount
0 000568 20060331 3.69 5.975 NaN
1 000568 20060630 9.14 10.143 NaN
2 000568 20060930 9.49 13.854 NaN
3 000568 20061231 15.84 19.262 NaN
4 000568 20070331 17.00 6.803 NaN
5 000568 20070630 26.31 12.940 NaN
6 000568 20070930 39.12 19.977 NaN
7 000568 20071231 45.94 29.269 NaN
8 000568 20080331 38.75 12.668 NaN
9 000568 20080630 30.09 21.102 NaN
10 000568 20080930 26.00 30.769 NaN
I wanted to select the last 3 rows and tried df1.ix[-3:]
, but it returns all the rows. Why? How to get the last 3 rows of df1
? I'm using pandas 0.10.1.
Upvotes: 250
Views: 479727
Reputation: 23111
The top two answers suggest that there may be 2 ways to get the same output but if you look at the source code, .tail(n)
is a syntactic sugar for .iloc[-n:]
. For the task of getting the last n rows as in the title, they are exactly the same.
However, they are different if we want to get the last n rows of a group because there is no groupby.iloc but there is groupby.tail
(and groupby.nth
).
Another, slightly obscure, way to get the last rows is via take()
which is similar to numpy.take
; however, we have to pass a list of indices: df.take(range(-n, 0))
.
The main difference of this from tail
/iloc
is related to error handling. If the dataframe has less than n rows, but we try to get the last n rows, tail
/iloc
returns the entire dataframe while take
raises an error. This comes in handy if you're making calls to an API or webscraping etc. where you expect the dataframe to have a certain shape but something fails unexpectedly; tail
may silently produce a wrong output while take
can alert you.
df = pd.DataFrame({'a': [1, 2]})
df.tail(5) # <--- entire dataframe
df.iloc[-5:] # <--- entire dataframe
df.take(range(-5,0)) # <--- IndexError: indices are out-of-bounds
Upvotes: 0
Reputation: 402483
How to get the last N rows of a pandas DataFrame?
If you are slicing by position, __getitem__
(i.e., slicing with[]
) works well, and is the most succinct solution I've found for this problem.
pd.__version__
# '0.24.2'
df = pd.DataFrame({'A': list('aaabbbbc'), 'B': np.arange(1, 9)})
df
A B
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6
6 b 7
7 c 8
df[-3:]
A B
5 b 6
6 b 7
7 c 8
This is the same as calling df.iloc[-3:]
, for instance (iloc
internally delegates to __getitem__
).
As an aside, if you want to find the last N rows for each group, use groupby
and GroupBy.tail
:
df.groupby('A').tail(2)
A B
1 a 2
2 a 3
5 b 6
6 b 7
7 c 8
Upvotes: 14
Reputation: 375475
This is because of using integer indices (ix
selects those by label over -3 rather than position, and this is by design: see integer indexing in pandas "gotchas"*).
*In newer versions of pandas prefer loc or iloc to remove the ambiguity of ix as position or label:
df.iloc[-3:]
see the docs.
As Wes points out, in this specific case you should just use tail!
Upvotes: 115