Reputation: 7514
This seems like a ridiculously easy question... but I'm not seeing the easy answer I was expecting.
So, how do I get the value at an nth row of a given column in Pandas? (I am particularly interested in the first row, but would be interested in a more general practice as well).
For example, let's say I want to pull the 1.2 value in Btime
as a variable.
Whats the right way to do this?
>>> df_test
ATime X Y Z Btime C D E
0 1.2 2 15 2 1.2 12 25 12
1 1.4 3 12 1 1.3 13 22 11
2 1.5 1 10 6 1.4 11 20 16
3 1.6 2 9 10 1.7 12 29 12
4 1.9 1 1 9 1.9 11 21 19
5 2.0 0 0 0 2.0 8 10 11
6 2.4 0 0 0 2.4 10 12 15
Upvotes: 546
Views: 1338101
Reputation: 23011
According to pandas docs, at
is the fastest way to access a scalar value such as the use case in the OP (already suggested by Alex on this page).
Building upon Alex's answer, because dataframes don't necessarily have a range index it might be more complete to index df.index
(since dataframe indexes are built on numpy arrays, you can index them like an array) or call get_loc()
on columns to get the integer location of a column.
df.at[df.index[0], 'Btime']
df.iat[0, df.columns.get_loc('Btime')]
One common problem is that if you used a boolean mask to get a single value, but ended up with a value with an index (actually a Series); e.g.:
0 1.2
Name: Btime, dtype: float64
you can use squeeze()
to get the scalar value, i.e.
df.loc[df['Btime']<1.3, 'Btime'].squeeze()
Upvotes: 0
Reputation: 2449
.iat
and .at
are the methods for getting and setting single values and are much faster than .iloc
and .loc
. Mykola Zotko pointed this out in their answer, but they did not use .iat
to its full extent.
When we can use .iat
or .at
, we should only have to index into the dataframe once.
This is not great:
df['Btime'].iat[0]
It is not ideal because the 'Btime' column was first selected as a series, then .iat
was used to index into that series.
These two options are the best:
df.iat[0, 4] # get the value in the zeroth row, and 4th column
df.at[0, 'Btime'] # get the value where the index label is 0 and the column name is "Btime".
Both methods return the value of 1.2.
Upvotes: 13
Reputation: 17794
To access a single value you can use the method iat
that is much faster than iloc
:
df['Btime'].iat[0]
You can also use the method take
:
df['Btime'].take(0)
Upvotes: 13
Reputation: 1238
To get e.g the value from column 'test' and row 1 it works like
df[['test']].values[0][0]
as only df[['test']].values[0]
gives back a array
Upvotes: 7
Reputation: 701
Another way of getting the first row and preserving the index:
x = df.first('d') # Returns the first day. '3d' gives first three days.
Upvotes: 2
Reputation: 879093
To select the ith
row, use iloc
:
In [31]: df_test.iloc[0]
Out[31]:
ATime 1.2
X 2.0
Y 15.0
Z 2.0
Btime 1.2
C 12.0
D 25.0
E 12.0
Name: 0, dtype: float64
To select the ith value in the Btime
column you could use:
In [30]: df_test['Btime'].iloc[0]
Out[30]: 1.2
df_test['Btime'].iloc[0]
(recommended) and df_test.iloc[0]['Btime']
:DataFrames store data in column-based blocks (where each block has a single
dtype). If you select by column first, a view can be returned (which is
quicker than returning a copy) and the original dtype is preserved. In contrast,
if you select by row first, and if the DataFrame has columns of different
dtypes, then Pandas copies the data into a new Series of object dtype. So
selecting columns is a bit faster than selecting rows. Thus, although
df_test.iloc[0]['Btime']
works, df_test['Btime'].iloc[0]
is a little bit
more efficient.
There is a big difference between the two when it comes to assignment.
df_test['Btime'].iloc[0] = x
affects df_test
, but df_test.iloc[0]['Btime']
may not. See below for an explanation of why. Because a subtle difference in
the order of indexing makes a big difference in behavior, it is better to use single indexing assignment:
df.iloc[0, df.columns.get_loc('Btime')] = x
df.iloc[0, df.columns.get_loc('Btime')] = x
(recommended):The recommended way to assign new values to a DataFrame is to avoid chained indexing, and instead use the method shown by andrew,
df.loc[df.index[n], 'Btime'] = x
or
df.iloc[n, df.columns.get_loc('Btime')] = x
The latter method is a bit faster, because df.loc
has to convert the row and column labels to
positional indices, so there is a little less conversion necessary if you use
df.iloc
instead.
df['Btime'].iloc[0] = x
works, but is not recommended:Although this works, it is taking advantage of the way DataFrames are currently implemented. There is no guarantee that Pandas has to work this way in the future. In particular, it is taking advantage of the fact that (currently) df['Btime']
always returns a
view (not a copy) so df['Btime'].iloc[n] = x
can be used to assign a new value
at the nth location of the Btime
column of df
.
Since Pandas makes no explicit guarantees about when indexers return a view versus a copy, assignments that use chained indexing generally always raise a SettingWithCopyWarning
even though in this case the assignment succeeds in modifying df
:
In [22]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [24]: df['bar'] = 100
In [25]: df['bar'].iloc[0] = 99
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
In [26]: df
Out[26]:
foo bar
0 A 99 <-- assignment succeeded
2 B 100
1 C 100
df.iloc[0]['Btime'] = x
does not work:In contrast, assignment with df.iloc[0]['bar'] = 123
does not work because df.iloc[0]
is returning a copy:
In [66]: df.iloc[0]['bar'] = 123
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
In [67]: df
Out[67]:
foo bar
0 A 99 <-- assignment failed
2 B 100
1 C 100
Warning: I had previously suggested df_test.ix[i, 'Btime']
. But this is not guaranteed to give you the ith
value since ix
tries to index by label before trying to index by position. So if the DataFrame has an integer index which is not in sorted order starting at 0, then using ix[i]
will return the row labeled i
rather than the ith
row. For example,
In [1]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [2]: df
Out[2]:
foo
0 A
2 B
1 C
In [4]: df.ix[1, 'foo']
Out[4]: 'C'
Upvotes: 841
Reputation: 137
In a general way, if you want to pick up the first N rows from the J column from pandas dataframe
the best way to do this is:
data = dataframe[0:N][:,J]
Upvotes: 12
Reputation: 2903
Another way to do this:
first_value = df['Btime'].values[0]
This way seems to be faster than using .iloc
:
In [1]: %timeit -n 1000 df['Btime'].values[20]
5.82 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [2]: %timeit -n 1000 df['Btime'].iloc[20]
29.2 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 49
Reputation: 141
df.iloc[0].head(1)
- First data set only from entire first row.df.iloc[0]
- Entire First row in column.Upvotes: 14
Reputation: 1843
Note that the answer from @unutbu will be correct until you want to set the value to something new, then it will not work if your dataframe is a view.
In [4]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [5]: df['bar'] = 100
In [6]: df['bar'].iloc[0] = 99
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.16.0_19_g8d2818e-py2.7-macosx-10.9-x86_64.egg/pandas/core/indexing.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
Another approach that will consistently work with both setting and getting is:
In [7]: df.loc[df.index[0], 'foo']
Out[7]: 'A'
In [8]: df.loc[df.index[0], 'bar'] = 99
In [9]: df
Out[9]:
foo bar
0 A 99
2 B 100
1 C 100
Upvotes: 37