agftrading
agftrading

Reputation: 894

How to extract values from a Pandas DataFrame, rather than a Series (without referencing the index)?

I am trying to return a specific item from a Pandas DataFrame via conditional selection (and do not want to have to reference the index to do so).

Here is an example:

I have the following dataframe:

  Code  Colour  Fruit
0   1   red     apple
1   2   orange  orange
2   3   yellow  banana
3   4   green   pear
4   5   blue    blueberry

I enter the following code to search for the code for blueberries:

df[df['Fruit'] == 'blueberry']['Code']

This returns:

4    5
Name: Code, dtype: int64

which is of type:

pandas.core.series.Series

but what I actually want to return is the number 5 of type:

numpy.int64

which I can do if I enter the following code:

df[df['Fruit'] == 'blueberry']['Code'][4]

i.e. referencing the index to give the number 5, but I do not want to have to reference the index!

Is there another syntax that I can deploy here to achieve the same thing?

Thank you!...

Update:

One further idea is this code:

df[df['Fruit'] == 'blueberry']['Code'][df[df['Fruit']=='blueberry'].index[0]]

However, this does not seem particularly elegant (and it references the index). Is there a more concise and precise method that does not need to reference the index or is this strictly necessary?

Thanks!...

Upvotes: 8

Views: 32553

Answers (4)

chrisgal
chrisgal

Reputation: 1

Easiest solution: convert pandas.core.series.Series to integer!

my_code = int(df[df['Fruit'] == 'blueberry']['Code'])
print(my_code)

Outputs:

5  

Upvotes: 0

Sam
Sam

Reputation: 11

you can also set your 'Fruit' column as ann index

df_fruit_index = df.set_index('Fruit')

and extract the value from the 'Code' column based on the fruit you choose

df_fruit_index.loc['blueberry','Code']

Upvotes: 0

jpp
jpp

Reputation: 164843

Referencing index is a requirement (unless you use next()^), since a pd.Series is not guaranteed to have one value.

You can use pd.Series.values to extract the values as an array. This also works if you have multiple matches:

res = df.loc[df['Fruit'] == 'blueberry', 'Code'].values

# array([5], dtype=int64)

df2 = pd.concat([df]*5)
res = df2.loc[df2['Fruit'] == 'blueberry', 'Code'].values

# array([5, 5, 5, 5, 5], dtype=int64)

To get a list from the numpy array, you can use .tolist():

res = df.loc[df['Fruit'] == 'blueberry', 'Code'].values.tolist()

Both the array and the list versions can be indexed intuitively, e.g. res[0] for the first item.

^ If you are really opposed to using index, you can use next() to iterate:

next(iter(res))

Upvotes: 3

Scott Boston
Scott Boston

Reputation: 153560

Let's try this:

df.loc[df['Fruit'] == 'blueberry','Code'].values[0]

Output:

5

First, use .loc to access the values in your dataframe using the boolean indexing for row selection and index label for column selection. The convert that returned series to an array of values and since there is only one value in that array you can use index '[0]' get the scalar value from that single element array.

Upvotes: 8

Related Questions