Marius
Marius

Reputation: 1060

get value from dataframe based on row values without using column names

I am trying to get a value situated on the third column from a pandas dataframe by knowing the values of interest on the first two columns, which point me to the right value to fish out. I do not know the row index, just the values I need to look for on the first two columns. The combination of values from the first two columns is unique, so I do not expect to get a subset of the dataframe, but only a row. I do not have column names and I would like to avoid using them.

Consider the dataframe df:

a    1    bla 
b    2    tra
b    3    foo
b    1    bar
c    3    cra

I would like to get tra from the second row, based on the b and 2 combination that I know beforehand. I've tried subsetting with

df = df.loc['b', :]

which returns all the rows with b on the same column (provided I've read the data with index_col = 0) but I am not able to pass multiple conditions on it without crashing or knowing the index of the row of interest. I tried both df.loc and df.iloc.

In other words, ideally I would like to get tra without even using row indexes, by doing something like:

df[(df[,0] == 'b' & df[,1] == `2`)][2]

Any suggestions? Probably it is something simple enough, but I have the tendency to use the same syntax as in R, which apparently is not compatible.

Thank you in advance

Upvotes: 2

Views: 1649

Answers (2)

Marius
Marius

Reputation: 1060

As @anky has suggested, a way to do this without knowing the column names nor the row index where your value of interest is, would be to read the file in a pandas dataframe using multiple column indexing.

For the provided example, knowing the column indexes at least, that would be:

df = pd.read_csv(path, sep='\t', index_col=[0, 1])

then, you can use:

df = df.iloc[df.index.get_loc(("b", 2)):]
df.iloc[0]

to get the value of interest.

Thanks again @anky for your help. If you found this question useful, please upvote @anky 's comment in the posted question.

Upvotes: 2

Daniel Lenz
Daniel Lenz

Reputation: 3857

I'd probably use pd.query for that:

import pandas as pd

df = pd.DataFrame(index=['a', 'b', 'b', 'b', 'c'], data={"col1": [1, 2, 3, 1, 3], "col2": ['bla', 'tra', 'foo', 'bar', 'cra']})
df

   col1 col2
a   1   bla
b   2   tra
b   3   foo
b   1   bar
c   3   cra

df.query('col1 == 2 and col2 == "tra"')
    col1 col2
b   2   tra

Upvotes: 1

Related Questions