Reputation: 343
I have two dataframes, each has the same columns 1) the response made by a participant 2) the response time in seconds and milliseconds (s.ms). For example,
subjectData =
Key RT
0 v 2.20
1 v 4.34
2 v 5.51
3 v 10.39
4 w 12.50
5 v 14.62
6 v 20.22
I also have a dataframe that is the 'correct' responses and times. For example,
correctData =
Key RT
0 v 2.25
1 w 4.34
2 v 5.61
3 v 20.30
I want to indicate that there is a match in both response key AND the response time within -+1 second. So, first check that the response key matches, and if it does, then compare the time that this response occurred. If it occurred within 1s, it is deemed correct. Notice that the subject may have responded more times than was correct. So I want to compare these columns regardless of order. For example, notice above that the 6th response in the subjectData frame matches the 3rd in the correctData frame (within one second). Because of this, the third entry in the output is TRUE, indicating that the third correct answer was matched.
So the end result should look like this
TRUE
FALSE
TRUE
TRUE
Notice that the output is the same length as the correctData dataframe, and indicates which correct answers match the subjectData. So it indicates that the subject got it correctly IF they pressed the correct button, within one second of the 'correct' time listed in the dataframe provided. Please note that these dataframes will most likely NOT be the same length (the subject may respond more or less than the 'correct' number of responses). So 'join' may not work here.
Any ideas on how to do this most efficiently?
Upvotes: 0
Views: 134
Reputation: 7088
subjectData = pd.DataFrame({'Key': ['v', 'v', 'v', 'v', 'w', 'v', 'v'],
'RT': [2.20, 4.34, 5.51, 10.39, 12.50, 14.62, 20.22]})
correctData = pd.DataFrame({'Key': ['v', 'w', 'v', 'v'],
'RT': [2.25, 4.34, 5.61, 20.30]})
df = subjectData.merge(correctData.reset_index(), on='Key', how='right',
suffixes=['_subj', '_corr'])
df_timed = df[(df['RT_subj'] - df['RT_corr']).between(-1,1)]
correctData.index.isin(df_timed['index'])
Output:
array([ True, False, True, True])
Upvotes: 2
Reputation: 59
See if this works.
cutoff_at_index = min(correctData.shape[0], subjectData.shape[0])
equal = subjectData.Key[:cutoff_at_index] == correctData.Key[:cutoff_at_index]
between = (subjectData.RT[:cutoff_at_index] >= correctData.RT[:cutoff_at_index]-1) \
& (subjectData.RT[:cutoff_at_index] <=correctData.RT[:cutoff_at_index]+1)
equal & between
Upvotes: 0
Reputation: 294516
I'd use numpy.isclose
(subjectData.Key == correctData.Key) & np.isclose(subjectData.RT, correctData.RT, atol=1)
0 True
1 False
2 True
3 False
Name: Key, dtype: bool
Upvotes: 1
Reputation: 30930
1) Use DataFrame.eq
to compare the key
column of both dataframe:
cond1=subjectData['Key'].eq(correctData['Key'])
2) then check if it is in the range of + -1s
cond2=(subjectData['RT']<(correctData['RT']+1))&(subjectData['RT']>(correctData['RT']-1))
3) finally check which rows meet both conditions (con1,cond2
):
cond1&cond2
0 True
1 False
2 True
3 False
dtype: bool
Upvotes: 1