Murcielago
Murcielago

Reputation: 1005

Search into pandas dataframe column for which input value is smaller than the next index column value

I want to search a dataframe to find a value correspond to index row and column value. However, I am struggling because the column header values are the upper bound of a range where the previous column header value is the lower bound ( but the uppper bound of its other neightbor column value).

I cannot find a way to match my input with the column corresponding to the lower bound of the range.

with an example it is very easy to see:

data_frame_test = pd.DataFrame({'location' : [1, 2, 'S'],
                                '200' : [342, 690, 103],
                                '1000' : [322, 120, 193],
                                '2000' : [249, 990, 403]})

data_frame_test = data_frame_test.set_index('location')

and what I want to do is this

location = 1
weight = 500

output = data_frame_test.iloc[1][??] #must be equal to 342

see, the column where weight must look into is 200, because it is in the range between ]200;1000[. I don't know what else to try to translate that into python code. Any help would be greatly appreciated.

Upvotes: 1

Views: 414

Answers (3)

jezrael
jezrael

Reputation: 863226

First convert columns to integers by rename if necessary:

data_frame_test = data_frame_test.set_index('location').rename(columns=int)
print (data_frame_test)
          200   1000  2000
location                  
1          342   322   249
2          690   120   990
S          103   193   403

weight = 500
location = 1

And then match values by positions with DataFrame.loc with last position of True value by compare by less values like weight:

#https://stackoverflow.com/a/8768734
b = (data_frame_test.columns[::-1] < weight)
pos = len(b) - np.argmax(b) - 1
print (pos)
0

output = data_frame_test.loc[location, data_frame_test.columns[pos]]
print (output)
342

Or you can use DataFrame.iloc with position by Index.get_loc:

output = data_frame_test.iloc[data_frame_test.index.get_loc(location), pos]
print (output)
342

Upvotes: 1

yatu
yatu

Reputation: 88276

I can think of cansting the columns to int, and indexing the dataframe using Series.searchsorted, and boolean indexing on the index:

location = 1
weight = 500 

data_frame_test.iloc[data_frame_test.index==location, 
                     data_frame_test.columns.astype('int').searchsorted(weight)-1]

location
1    342
Name: 200, dtype: int64

Upvotes: 1

Mohit Sharma
Mohit Sharma

Reputation: 590

You can make a custom function which will iterate over columns to check correct column and then return cell of that location:

def get_val(location, weight, df):
    col = df.columns[0]
    for c in df.columns:
        if weight >= int(c):
            col = c
        else:
            break
    return df.loc[location, col]
get_val(1, 500, data_frame_test)

Upvotes: 1

Related Questions