Reputation: 125
So given a csv file as a dataframe using pandas and python, i'd like to get the value which is in the same row as another value as efficiently as possible.
To clarify this, i will give you an example with the following csv:
STAID SOUID DATE TX Q_TX
162 100522 19010101 -31 0
162 100522 19010102 -13 0
162 100522 19010103 -5 0
162 100522 19010104 -10 0
162 100522 19010105 -18 0
So lets say im implementing te following code
import pandas as pd
data = pd.read_csv("foo.csv")
max_val = data["TX"].max()
Max_val will now get a value of -5. The thing is that I would now like to know the value in 'DATE' which will be in the same row as max_val, or in other words: a value in the column 'DATE' sharing the same index as the found value. The desired value that I'm aiming for is 19010103. What is the most efficient way to do this only using pandas??
UPDATE: Derped a bit with the min_val, it should obviously be max_val instead of min_val.
Upvotes: 2
Views: 390
Reputation: 323226
We can using idxmax
df.DATE[df.TX.idxmax()]
Out[346]: 19010103
For enhance the speed
df.values[2,df.TX.values.argmax()]
Upvotes: 5
Reputation: 5126
Using loc
is the "standard" and should be highly readable. but using idxmax
with at
is your FASTEST answer here (SEE Wen's Answer for where i got the idea). You may want to test with your real data to ensure this small amount of data isn't providing red herrings. See at
:
Fast label-based scalar accessor Similarly to loc, at provides label based scalar lookups. You can also set using these indexers.
Fastest answer here from my testing:
min_val = data.TX.idxmax() #with min_val's index already set
%%timeit
data.at[min_val,'DATE']
# 100000 loops, best of 3: 6.73 µs per loop
using %%timeit
on jupyter you can see the time:
%%timeit
data.loc[data['TX'] == min_val]['DATE']
# 1000 loops, best of 3: 604 µs per loop
%%timeit
data[data['TX']==min_val)].DATE #using from comments, and not using loc
# 1000 loops, best of 3: 724 µs per loop
%%timeit
data[data['TX']==data['TX'].max()]['DATE']
#1000 loops, best of 3: 575 µs per loop
%%timeit
data.at[data.TX.idxmax(),'DATE'] #using at and idxmax <----
# 10000 loops, best of 3: 69.5 µs per loop
%%timeit
data.at[data.loc[data['TX'] == min_val].index[0],'DATE']
# 1000 loops, best of 3: 560 µs per loop
Upvotes: 2