Pandas data frame - filter rows by values from another data frame

Question

Assume I have two pandas data frames, that their relevant columns are:

stimuli data frame:

   stimuli_id    rank         
0     23          0  
1     27          1 
2     62          2 
3     88          2 
4     99          1

while 'stimuli_id' is a unique index, and 'rank' is a integer in range of [0,2]. Relevant columns from trials data frame is:

     stim1     stim2        
0     23         27
1     27         62   
2     62         99

While both stim1 and stim2 represent stimuli_id from stimuli data frame.

Now I want to filter all rows in trials data frame where the rank of the second stimuli is greater. So the example above after filtering should look like this:

       stim1     stim2        
0       62         99

So eventually only for this trial stim1 is greater than stim2, and the rest are not so we filter them.

I have tried the following:

trials.loc[stimuli.loc[stimuli["stimuli id"] == trials["stim1"]].iloc[0]["rank"] > stimuli.loc[stimuli["stimuli id"] == trials["stim2"]].iloc[0]["rank"]]

But a value error has been raised:

{ValueError}Can only compare identically-labeled Series objects

I have been searching for hours for any solution but found nothing helpful.

ALollz · Accepted Answer

Since 'stimuli_id' is a unique key for that DataFrame, use the Series to map the different stim columns to ranks and check the comparison. (By "rank of the second stimuli is greater" I assume you mean a smaller number).

s = stimuli.set_index('stimuli_id')['rank']

trials[trials['stim2'].map(s) < trials['stim1'].map(s)]
#   stim1  stim2
#2     62     99

With mapping each column we are logically creating the mask with the following comparison:

#rank2      rank1
#    1   <      0   # False
#    2   <      1   # False
#    1   <      2   # True

Pandas data frame - filter rows by values from another data frame

Answers (1)

Related Questions