Reputation: 131038
I have constructed a condition that extracts exactly one row from my dataframe:
d2 = df[(df['l_ext']==l_ext) & (df['item']==item) & (df['wn']==wn) & (df['wd']==1)]
Now I would like to take a value from a particular column:
val = d2['col_name']
But as a result, I get a dataframe that contains one row and one column (i.e., one cell). It is not what I need. I need one value (one float number). How can I do it in pandas?
Upvotes: 744
Views: 2707146
Reputation: 21
You can get the values like this:
df[(df['column1']==any_value) & (df['column2']==any_value) & (df['column']==any_value)]['column_with_values_to_get']
And you can add (df['columnx']==any_value)
as much as you want
Upvotes: 0
Reputation: 23011
If a single row was filtered from a dataframe, one way to get a scalar value from a single cell is squeeze()
(or item()
):
df = pd.DataFrame({'A':range(5), 'B': range(5)})
d2 = df[df['A'].le(5) & df['B'].eq(3)]
val = d2['A'].squeeze() # 3
val = d2['A'].item() # 3
In fact, item()
may be called on the index, so item
+ at
combo could work.
msk = df['A'].le(5) & df['B'].eq(3)
val = df.at[df.index[msk].item(), 'B'] # 3
In fact, the latter method is much faster than any other method listed here to get a single cell value.
df = pd.DataFrame({'A':range(10000), 'B': range(10000)})
msk = df['A'].le(5) & df['B'].eq(3)
%timeit df.at[df.index[msk].item(), 'A']
# 31.4 µs ± 5.83 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit df.loc[msk, 'A'].squeeze()
# 143 µs ± 8.99 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit df.loc[msk, 'A'].item()
# 125 µs ± 1.56 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit df.loc[msk, 'A'].iat[0]
# 125 µs ± 1.96 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit df[msk]['A'].values[0]
# 189 µs ± 8.67 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Upvotes: 8
Reputation: 309
Converting it to integer worked for me but if you need float it is also simple:
int(sub_df.iloc[0])
for float:
float(sub_df.iloc[0])
Upvotes: 5
Reputation: 11
Display the data from a certain cell in pandas dataframe
Using dataframe.iloc,
Dataframe.iloc should be used when given index is the actual index made when the pandas dataframe is created.
Avoid using dataframe.iloc on custom indices.
print(df['REVIEWLIST'].iloc[df.index[1]])
Using dataframe.loc,
Use dataframe.loc if you're using a custom index it can also be used instead of iloc too even the dataframe contains default indices.
print(df['REVIEWLIST'].loc[df.index[1315]])
Upvotes: 1
Reputation: 1250
I needed the value of one cell, selected by column and index names. This solution worked for me:
df.loc[1,:].values[0]
Upvotes: 36
Reputation: 322
In later versions, you can fix it by simply doing:
val = float(d2['col_name'].iloc[0])
Upvotes: 14
Reputation: 218
I've run across this when using dataframes with MultiIndexes and found squeeze useful.
From the documentation:
Squeeze 1 dimensional axis objects into scalars.
Series or DataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is unchanged.
# Example for a dataframe with MultiIndex
> import pandas as pd
> df = pd.DataFrame(
[
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
],
index=pd.MultiIndex.from_tuples( [('i', 1), ('ii', 2), ('iii', 3)] ),
columns=pd.MultiIndex.from_tuples( [('A', 'a'), ('B', 'b'), ('C', 'c')] )
)
> df
A B C
a b c
i 1 1 2 3
ii 2 4 5 6
iii 3 7 8 9
> df.loc['ii', 'B']
b
2 5
> df.loc['ii', 'B'].squeeze()
5
Note that while df.at[]
also works (if you aren't needing to use conditionals) you then still AFAIK need to specify all levels of the MultiIndex.
Example:
> df.at[('ii', 2), ('B', 'b')]
5
I have a dataframe with a six-level index and two-level columns, so only having to specify the outer level is quite helpful.
Upvotes: 7
Reputation: 1105
It looks like changes after pandas 10.1 or 13.1.
I upgraded from 10.1 to 13.1. Before, iloc is not available.
Now with 13.1, iloc[0]['label']
gets a single value array rather than a scalar.
Like this:
lastprice = stock.iloc[-1]['Close']
Output:
date
2014-02-26 118.2
name:Close, dtype: float64
Upvotes: 26
Reputation: 5832
To get the full row's value as JSON (instead of a Serie):
row = df.iloc[0]
Use the to_json
method like below:
row.to_json()
Upvotes: -3
Reputation: 3871
You can turn your 1x1 dataframe into a NumPy array, then access the first and only value of that array:
val = d2['col_name'].values[0]
Upvotes: 339
Reputation: 2771
Most answers are using iloc which is good for selection by position.
If you need selection-by-label, loc would be more convenient.
For getting a value explicitly (equiv to deprecated df.get_value('a','A'))
# This is also equivalent to df1.at['a','A'] In [55]: df1.loc['a', 'A'] Out[55]: 0.13200317033032932
Upvotes: 53
Reputation: 369
The quickest and easiest options I have found are the following. 501 represents the row index.
df.at[501, 'column_name']
df.get_value(501, 'column_name')
Upvotes: 19
Reputation: 81
I am not sure if this is a good practice, but I noticed I can also get just the value by casting the series as float
.
E.g.,
rate
3 0.042679
Name: Unemployment_rate, dtype: float64
float(rate)
0.0426789
Upvotes: 8
Reputation: 375367
If you have a DataFrame with only one row, then access the first (only) row as a Series using iloc, and then the value using the column name:
In [3]: sub_df
Out[3]:
A B
2 -0.133653 -0.030854
In [4]: sub_df.iloc[0]
Out[4]:
A -0.133653
B -0.030854
Name: 2, dtype: float64
In [5]: sub_df.iloc[0]['A']
Out[5]: -0.13365288513107493
Upvotes: 825
Reputation: 365
For pandas 0.10, where iloc is unavailable, filter a DF
and get the first row data for the column VALUE
:
df_filt = df[df['C1'] == C1val & df['C2'] == C2val]
result = df_filt.get_value(df_filt.index[0],'VALUE')
If there is more than one row filtered, obtain the first row value. There will be an exception if the filter results in an empty data frame.
Upvotes: 6
Reputation: 128918
These are fast access methods for scalars:
In [15]: df = pandas.DataFrame(numpy.random.randn(5, 3), columns=list('ABC'))
In [16]: df
Out[16]:
A B C
0 -0.074172 -0.090626 0.038272
1 -0.128545 0.762088 -0.714816
2 0.201498 -0.734963 0.558397
3 1.563307 -1.186415 0.848246
4 0.205171 0.962514 0.037709
In [17]: df.iat[0, 0]
Out[17]: -0.074171888537611502
In [18]: df.at[0, 'A']
Out[18]: -0.074171888537611502
Upvotes: 385
Reputation: 523
Using .item()
returns a scalar (not a Series
), and it only works if there is a single element selected. It's much safer than .values[0]
which will return the first element regardless of how many are selected.
>>> df = pd.DataFrame({'a': [1,2,2], 'b': [4,5,6]})
>>> df[df['a'] == 1]['a'] # Returns a Series
0 1
Name: a, dtype: int64
>>> df[df['a'] == 1]['a'].item()
1
>>> df2 = df[df['a'] == 2]
>>> df2['b']
1 5
2 6
Name: b, dtype: int64
>>> df2['b'].values[0]
5
>>> df2['b'].item()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/pandas/core/base.py", line 331, in item
raise ValueError("can only convert an array of size 1 to a Python scalar")
ValueError: can only convert an array of size 1 to a Python scalar
Upvotes: 2
Reputation: 1057
It doesn't need to be complicated:
val = df.loc[df.wd==1, 'col_name'].values[0]
Upvotes: 55
Reputation: 139
df_gdp.columns
Index([u'Country', u'Country Code', u'Indicator Name', u'Indicator Code', u'1960', u'1961', u'1962', u'1963', u'1964', u'1965', u'1966', u'1967', u'1968', u'1969', u'1970', u'1971', u'1972', u'1973', u'1974', u'1975', u'1976', u'1977', u'1978', u'1979', u'1980', u'1981', u'1982', u'1983', u'1984', u'1985', u'1986', u'1987', u'1988', u'1989', u'1990', u'1991', u'1992', u'1993', u'1994', u'1995', u'1996', u'1997', u'1998', u'1999', u'2000', u'2001', u'2002', u'2003', u'2004', u'2005', u'2006', u'2007', u'2008', u'2009', u'2010', u'2011', u'2012', u'2013', u'2014', u'2015', u'2016'], dtype='object')
df_gdp[df_gdp["Country Code"] == "USA"]["1996"].values[0]
8100000000000.0
Upvotes: 9