Comparing columns of different dataframes conditionally in python

I have two dataframes, each has the same columns 1) the response made by a participant 2) the response time in seconds and milliseconds (s.ms). For example,

subjectData = 

Key     RT
0   v   2.20
1   v   4.34
2   v   5.51
3   v  10.39
4   w  12.50
5   v  14.62
6   v  20.22

I also have a dataframe that is the 'correct' responses and times. For example,

correctData = 

Key     RT
0   v   2.25
1   w   4.34
2   v   5.61
3   v  20.30

I want to indicate that there is a match in both response key AND the response time within -+1 second. So, first check that the response key matches, and if it does, then compare the time that this response occurred. If it occurred within 1s, it is deemed correct. Notice that the subject may have responded more times than was correct. So I want to compare these columns regardless of order. For example, notice above that the 6th response in the subjectData frame matches the 3rd in the correctData frame (within one second). Because of this, the third entry in the output is TRUE, indicating that the third correct answer was matched.

So the end result should look like this

TRUE
FALSE
TRUE
TRUE

Notice that the output is the same length as the correctData dataframe, and indicates which correct answers match the subjectData. So it indicates that the subject got it correctly IF they pressed the correct button, within one second of the 'correct' time listed in the dataframe provided. Please note that these dataframes will most likely NOT be the same length (the subject may respond more or less than the 'correct' number of responses). So 'join' may not work here.

Any ideas on how to do this most efficiently?

Upvotes: 0

Answers (4)

EliadL

Reputation: 7088

subjectData = pd.DataFrame({'Key': ['v', 'v', 'v', 'v', 'w', 'v', 'v'],
                            'RT': [2.20, 4.34, 5.51, 10.39, 12.50, 14.62, 20.22]})

correctData = pd.DataFrame({'Key': ['v', 'w', 'v', 'v'],
                            'RT': [2.25, 4.34, 5.61, 20.30]})

df = subjectData.merge(correctData.reset_index(), on='Key', how='right', 
                       suffixes=['_subj', '_corr'])

df_timed = df[(df['RT_subj'] - df['RT_corr']).between(-1,1)]

correctData.index.isin(df_timed['index'])

Output:

array([ True, False,  True,  True])

Upvotes: 2

DarkDrassher34

Reputation: 59

See if this works.

cutoff_at_index = min(correctData.shape[0], subjectData.shape[0])
equal = subjectData.Key[:cutoff_at_index] == correctData.Key[:cutoff_at_index]
between = (subjectData.RT[:cutoff_at_index] >= correctData.RT[:cutoff_at_index]-1) \
          & (subjectData.RT[:cutoff_at_index] <=correctData.RT[:cutoff_at_index]+1)
equal & between

Upvotes: 0

piRSquared

Reputation: 294516

I'd use numpy.isclose

(subjectData.Key == correctData.Key) & np.isclose(subjectData.RT, correctData.RT, atol=1)

0     True
1    False
2     True
3    False
Name: Key, dtype: bool

Upvotes: 1

ansev

Reputation: 30930

1) Use DataFrame.eq to compare the key column of both dataframe:

cond1=subjectData['Key'].eq(correctData['Key'])

2) then check if it is in the range of + -1s

cond2=(subjectData['RT']<(correctData['RT']+1))&(subjectData['RT']>(correctData['RT']-1))

3) finally check which rows meet both conditions (con1,cond2):

cond1&cond2

0     True
1    False
2     True
3    False
dtype: bool

Upvotes: 1

Comparing columns of different dataframes conditionally in python

Answers (4)

Related Questions