jaried
jaried

Reputation: 672

How to take a scalar of a certain value in another column based on the value of a certain column in a Dataframe?

I have a dataframe with two columns A and B. I want to take the scalar of column B based on the value of column A. I used loc and .value [0]

My data volume is relatively small, the main problem is to see whether the syntax of the code is correct. .value seems to be deprecated.

import pandas as pd
import numpy as np

df = pd.DataFrame()
df[['A', 'B']] = pd.DataFrame(np.arange(10).reshape((5, 2)))
df1 = df.loc[df['A'] == 4, 'B'].values[0]
print(df1)

The result is

5

Can this code be optimized?

df1 = df.loc[df['A'] == 4, 'B'].values[0]

numpy is faster:

%timeit df1 = df[df['A'] == 4].B.iloc[0]
723 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df1=df.loc[df['A'] == 4, 'B'].to_numpy()[0]
513 µs ± 4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df1 = df.loc[df['A'] == 4, 'B'].iloc[0]
521 µs ± 20.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 1

Views: 58

Answers (1)

jezrael
jezrael

Reputation: 863291

If you need optimalized with return some value if condition failed use next with iter:

a = next(iter(df.loc[df['A'] == 4, 'B']), 'no match')
print (a)
5

a = next(iter(df.loc[df['A'] == 1000, 'B']), 'no match')
print (a)
no match

If values always matching is possible use Series.to_numpy, but this failed if no match, so better not use:

df.loc[df['A'] == 4, 'B'].to_numpy()[0]
#but this failed
#df.loc[df['A'] == 1000, 'B'].to_numpy()[0]

Upvotes: 1

Related Questions