Reputation: 672
I have a dataframe with two columns A
and B
. I want to take the scalar of column B
based on the value of column A
. I used loc
and .value
[0]
My data volume is relatively small, the main problem is to see whether the syntax of the code is correct. .value
seems to be deprecated.
import pandas as pd
import numpy as np
df = pd.DataFrame()
df[['A', 'B']] = pd.DataFrame(np.arange(10).reshape((5, 2)))
df1 = df.loc[df['A'] == 4, 'B'].values[0]
print(df1)
The result is
5
Can this code be optimized?
df1 = df.loc[df['A'] == 4, 'B'].values[0]
numpy
is faster:
%timeit df1 = df[df['A'] == 4].B.iloc[0]
723 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df1=df.loc[df['A'] == 4, 'B'].to_numpy()[0]
513 µs ± 4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df1 = df.loc[df['A'] == 4, 'B'].iloc[0]
521 µs ± 20.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 1
Views: 58
Reputation: 863291
If you need optimalized with return some value if condition failed use next
with iter
:
a = next(iter(df.loc[df['A'] == 4, 'B']), 'no match')
print (a)
5
a = next(iter(df.loc[df['A'] == 1000, 'B']), 'no match')
print (a)
no match
If values always matching is possible use Series.to_numpy
, but this failed if no match, so better not use:
df.loc[df['A'] == 4, 'B'].to_numpy()[0]
#but this failed
#df.loc[df['A'] == 1000, 'B'].to_numpy()[0]
Upvotes: 1