Reputation: 25
Assume I have two pandas data frames, that their relevant columns are:
stimuli data frame:
stimuli_id rank
0 23 0
1 27 1
2 62 2
3 88 2
4 99 1
while 'stimuli_id' is a unique index, and 'rank' is a integer in range of [0,2]. Relevant columns from trials data frame is:
stim1 stim2
0 23 27
1 27 62
2 62 99
While both stim1 and stim2 represent stimuli_id from stimuli data frame.
Now I want to filter all rows in trials data frame where the rank of the second stimuli is greater. So the example above after filtering should look like this:
stim1 stim2
0 62 99
So eventually only for this trial stim1 is greater than stim2, and the rest are not so we filter them.
I have tried the following:
trials.loc[stimuli.loc[stimuli["stimuli id"] == trials["stim1"]].iloc[0]["rank"] > stimuli.loc[stimuli["stimuli id"] == trials["stim2"]].iloc[0]["rank"]]
But a value error has been raised:
{ValueError}Can only compare identically-labeled Series objects
I have been searching for hours for any solution but found nothing helpful.
Upvotes: 1
Views: 268
Reputation: 59579
Since 'stimuli_id'
is a unique key for that DataFrame, use the Series to map
the different stim columns to ranks and check the comparison. (By "rank of the second stimuli is greater" I assume you mean a smaller number).
s = stimuli.set_index('stimuli_id')['rank']
trials[trials['stim2'].map(s) < trials['stim1'].map(s)]
# stim1 stim2
#2 62 99
With mapping each column we are logically creating the mask with the following comparison:
#rank2 rank1
# 1 < 0 # False
# 2 < 1 # False
# 1 < 2 # True
Upvotes: 3