Reputation: 894
I am trying to return a specific item from a Pandas DataFrame via conditional selection (and do not want to have to reference the index to do so).
Here is an example:
I have the following dataframe:
Code Colour Fruit
0 1 red apple
1 2 orange orange
2 3 yellow banana
3 4 green pear
4 5 blue blueberry
I enter the following code to search for the code for blueberries:
df[df['Fruit'] == 'blueberry']['Code']
This returns:
4 5
Name: Code, dtype: int64
which is of type:
pandas.core.series.Series
but what I actually want to return is the number 5 of type:
numpy.int64
which I can do if I enter the following code:
df[df['Fruit'] == 'blueberry']['Code'][4]
i.e. referencing the index to give the number 5, but I do not want to have to reference the index!
Is there another syntax that I can deploy here to achieve the same thing?
Thank you!...
Update:
One further idea is this code:
df[df['Fruit'] == 'blueberry']['Code'][df[df['Fruit']=='blueberry'].index[0]]
However, this does not seem particularly elegant (and it references the index). Is there a more concise and precise method that does not need to reference the index or is this strictly necessary?
Thanks!...
Upvotes: 8
Views: 32553
Reputation: 1
Easiest solution: convert pandas.core.series.Series
to integer!
my_code = int(df[df['Fruit'] == 'blueberry']['Code'])
print(my_code)
Outputs:
5
Upvotes: 0
Reputation: 11
you can also set your 'Fruit' column as ann index
df_fruit_index = df.set_index('Fruit')
and extract the value from the 'Code' column based on the fruit you choose
df_fruit_index.loc['blueberry','Code']
Upvotes: 0
Reputation: 164843
Referencing index is a requirement (unless you use next()
^), since a pd.Series
is not guaranteed to have one value.
You can use pd.Series.values
to extract the values as an array. This also works if you have multiple matches:
res = df.loc[df['Fruit'] == 'blueberry', 'Code'].values
# array([5], dtype=int64)
df2 = pd.concat([df]*5)
res = df2.loc[df2['Fruit'] == 'blueberry', 'Code'].values
# array([5, 5, 5, 5, 5], dtype=int64)
To get a list from the numpy array, you can use .tolist()
:
res = df.loc[df['Fruit'] == 'blueberry', 'Code'].values.tolist()
Both the array and the list versions can be indexed intuitively, e.g. res[0]
for the first item.
^ If you are really opposed to using index, you can use next()
to iterate:
next(iter(res))
Upvotes: 3
Reputation: 153560
Let's try this:
df.loc[df['Fruit'] == 'blueberry','Code'].values[0]
Output:
5
First, use .loc
to access the values in your dataframe using the boolean indexing for row selection and index label for column selection. The convert that returned series to an array of values and since there is only one value in that array you can use index '[0]' get the scalar value from that single element array.
Upvotes: 8