Iterate in a dataframe with strings

Question

I'm trying to create a cognitive task named 2-backed test.

I created a semi-random list with certain conditions and now I wanted to know what should be the good answer for the participant.

I want a column in my dataframe saying if yes or no, 2 letters before it was the same letter.

Here is my code :

from random import choice, shuffle
import pandas as pd

num = 60

letters = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L']

# letters_1 = [1, 2, 3, 4, 5, 6]

my_list = [choice(letters), choice(letters)]
probab = list(range(num - 2))
shuffle(probab)

# We want 20% of the letters to repeat the letter 2 letters back
pourc = 20
repeatnum = num * pourc // 100
for i in probab:
    ch = prev = my_list[-2]
    if i >= repeatnum:
        while ch == prev:
            ch = choice(letters)
    my_list.append(ch)

df = pd.DataFrame(my_list, columns=["letters"])

df.head(10)
  letters
0       F
1       I
2       D
3       I
4       H
5       C
6       L
7       G
8       D
9       L

# Create a list to store the data
response = []

# For each row in the column,
for i in df['letters']:
    # if more than a value,
    if i == [i - 2]:
        response.append('yes')
    else:
        response.append('no')

# Create a column from the list
df['response'] = response

First error :

if i == [i - 2]:
TypeError: unsupported operand type(s) for -: 'str' and 'int'

If I use numbers instead of letters, I can get over this error, but I would prefer keeping letters..

But after that if I run it with number, I get no errors, but my new column response only have 'no'. But I know that 12 times it should be 'yes'.

cs95 · Accepted Answer

It seems like you want to perform a comparison on the column and the same column shifted by two elements. Use shift + np.where -

df['response'] = np.where(df.letters.eq(df.letters.shift(2)), 'yes', 'no')
df.head(10)

  letters response
0       F       no
1       I       no
2       D       no
3       I      yes
4       H       no
5       C       no
6       L       no
7       G       no
8       D       no
9       L       no

But I know that 12 times it should be 'yes'.

df.response.eq('yes').sum()
12

Iterate in a dataframe with strings

Answers (1)

Related Questions