Reputation: 97
I'm trying to create a cognitive task named 2-backed test.
I created a semi-random list with certain conditions and now I wanted to know what should be the good answer for the participant.
I want a column in my dataframe saying if yes or no, 2 letters before it was the same letter.
Here is my code :
from random import choice, shuffle
import pandas as pd
num = 60
letters = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L']
# letters_1 = [1, 2, 3, 4, 5, 6]
my_list = [choice(letters), choice(letters)]
probab = list(range(num - 2))
shuffle(probab)
# We want 20% of the letters to repeat the letter 2 letters back
pourc = 20
repeatnum = num * pourc // 100
for i in probab:
ch = prev = my_list[-2]
if i >= repeatnum:
while ch == prev:
ch = choice(letters)
my_list.append(ch)
df = pd.DataFrame(my_list, columns=["letters"])
df.head(10)
letters
0 F
1 I
2 D
3 I
4 H
5 C
6 L
7 G
8 D
9 L
# Create a list to store the data
response = []
# For each row in the column,
for i in df['letters']:
# if more than a value,
if i == [i - 2]:
response.append('yes')
else:
response.append('no')
# Create a column from the list
df['response'] = response
First error :
if i == [i - 2]:
TypeError: unsupported operand type(s) for -: 'str' and 'int'
If I use numbers instead of letters, I can get over this error, but I would prefer keeping letters..
But after that if I run it with number, I get no errors, but my new column response only have 'no'. But I know that 12 times it should be 'yes'.
Upvotes: 1
Views: 55
Reputation: 402473
It seems like you want to perform a comparison on the column and the same column shifted by two elements. Use shift
+ np.where
-
df['response'] = np.where(df.letters.eq(df.letters.shift(2)), 'yes', 'no')
df.head(10)
letters response
0 F no
1 I no
2 D no
3 I yes
4 H no
5 C no
6 L no
7 G no
8 D no
9 L no
But I know that 12 times it should be 'yes'.
df.response.eq('yes').sum()
12
Upvotes: 1