Willem van der Spek
Willem van der Spek

Reputation: 125

Get value in csv file in same row

So given a csv file as a dataframe using pandas and python, i'd like to get the value which is in the same row as another value as efficiently as possible.

To clarify this, i will give you an example with the following csv:

STAID   SOUID   DATE      TX     Q_TX
162     100522  19010101 -31     0
162     100522  19010102 -13     0
162     100522  19010103 -5      0
162     100522  19010104 -10     0
162     100522  19010105 -18     0

So lets say im implementing te following code

import pandas as pd

data = pd.read_csv("foo.csv")

max_val = data["TX"].max()

Max_val will now get a value of -5. The thing is that I would now like to know the value in 'DATE' which will be in the same row as max_val, or in other words: a value in the column 'DATE' sharing the same index as the found value. The desired value that I'm aiming for is 19010103. What is the most efficient way to do this only using pandas??

UPDATE: Derped a bit with the min_val, it should obviously be max_val instead of min_val.

Upvotes: 2

Views: 390

Answers (2)

BENY
BENY

Reputation: 323226

We can using idxmax

df.DATE[df.TX.idxmax()]
Out[346]: 19010103

For enhance the speed

df.values[2,df.TX.values.argmax()]

Upvotes: 5

MattR
MattR

Reputation: 5126

Using loc is the "standard" and should be highly readable. but using idxmax with at is your FASTEST answer here (SEE Wen's Answer for where i got the idea). You may want to test with your real data to ensure this small amount of data isn't providing red herrings. See at:

Fast label-based scalar accessor Similarly to loc, at provides label based scalar lookups. You can also set using these indexers.

Fastest answer here from my testing:

min_val = data.TX.idxmax() #with min_val's index already set
%%timeit
data.at[min_val,'DATE']
# 100000 loops, best of 3: 6.73 µs per loop

using %%timeit on jupyter you can see the time:

%%timeit
data.loc[data['TX'] == min_val]['DATE']
# 1000 loops, best of 3: 604 µs per loop

%%timeit
data[data['TX']==min_val)].DATE #using from comments, and not using loc
# 1000 loops, best of 3: 724 µs per loop

%%timeit
data[data['TX']==data['TX'].max()]['DATE']
#1000 loops, best of 3: 575 µs per loop 

%%timeit
data.at[data.TX.idxmax(),'DATE'] #using at and idxmax <----
# 10000 loops, best of 3: 69.5 µs per loop

%%timeit
data.at[data.loc[data['TX'] == min_val].index[0],'DATE']
# 1000 loops, best of 3: 560 µs per loop

Upvotes: 2

Related Questions