Getting a value out of pandas dataframe based on a set of conditions

Question

I have a dataframe as shown below

     Token          Label  StartID  EndID  
0   Germany         Country   0      2     
1   Berlin          Capital   6      9       
2   Frankfurt       City      15     18        
3   four   million  Number    21     24        
4   Sweden          Country   26     27       
5   United Kingdom  Country   32     34     
6   ten million     Number    40     45
7   London          Capital   50     55

I am trying to get row based on certain condition, i.e. associate the label Number to closest capital i.e. Berlin

3   four   million  Number    21     24   - > 1   Berlin          Capital   6      9

or something like:

df[row3] -> df [row1]

A pseudo logic

First check, for the rows with label: Number then (assumption is that the city is always '2 rows' above or below) and has the label: Capital. But, label: 'capital' loc is always after the label: Country

What I have done until now,

columnsName =['Token', 'Label', 'StartID', 'EndID']
df = pd.read_csv('resources/testcsv.csv', index_col= 0, skip_blank_lines=True, header=0)
print(df)

key_number = 'Number'
df_with_number = (df[df['Label'].str.lower().str.contains(r"\b{}\b".format(key_number), regex=True, case=False)])
print(df_with_number)

key_capital = 'Capital'
df_with_capitals = (df[df['Label'].str.lower().str.contains(r"\b{}\b".format(key_capital), regex=True, case=False)])
print(df_with_capitals)

key_country = 'Country'
df_with_country = (df[df[1].str.lower().str.contains(r"\b{}\b".format(key_country), regex=True, case=False)])
print(df_with_country)

The logic is to compare the index's and then make possible relations

i.e.

df[row3] -> [ df [row1], df[row7]]

Ben.T · Accepted Answer

you could use merge_asof with the parameter direction=nearest for example:

df_nb_cap = pd.merge_asof(df_with_number.reset_index(), 
                          df_with_capitals.reset_index(), 
                          on='index', 
                          suffixes=('_nb', '_cap'), direction='nearest')
print (df_nb_cap)
   index      Token_nb Label_nb  StartID_nb  EndID_nb Token_cap Label_cap  \
0      3  four_million   Number          21        24    Berlin   Capital   
1      6   ten_million   Number          40        45    London   Capital   

   StartID_cap  EndID_cap  
0            6          9  
1           50         55

Getting a value out of pandas dataframe based on a set of conditions

Answers (2)

Related Questions