Reputation: 1772
This example df6:
df6 = pd.DataFrame({
'answer1': ['Lo', 'New York', 'Toronto'],
'answer2': ['London', 'New', 'Paris'],
'answer3': ['CA', 'CA', 'CA'],
'correct': [["'London'","'CA'"], ["'New York'"], ["'Toronto'"]]
})
df6 gives:
answer1 answer2 answer3 correct
0 Lo London CA ['London','CA']
1 New York New CA ['New York']
2 Toronto Paris CA ['Toronto']
What I am trying to do is compare the exact answer from columns have the name "answer" with the correct column and if both match, the column name will be added to a new column.
My issue is I cannot find the right approach to do so.
This what I have tried even with the help with other SO members:
cols = df6.filter(like='answer').columns
df6['Answer'] = df6[cols].apply(lambda s: ', '.join(cols[(m:=[str(s[col]) in str(df6.loc[s.name, 'correct']) for col in cols])]) , axis=1)
This code gives:
answer1 answer2 answer3 correct Answer
0 Lo London CA ['London','CA'] answer1, answer2,answer3
1 New York New CA ['New York'] answer1, answer2
2 Toronto Paris CA ['Toronto'] answer1
The result in row 0 is not accurate because the Answer
column should have only answer2, answer3
which match ['London','CA'] .. same also for row 1
so in comparing results I think I should use ==
instead of in
:
[str(s[col]) in str(df6.loc[s.name, 'correct']
Please note that I needed to wrap all in str
to overcome dtype issues while matching
The desired output:
answer1 answer2 answer3 correct Answer
0 Lo London CA ['London','CA'] answer2, answer3
1 New York New CA ['New York'] answer1
2 Toronto Paris CA ['Toronto'] answer1
Upvotes: 0
Views: 490
Reputation: 120479
Flatten your dataframe to compare one answer to one correct answer:
df6['Answer'] = df6.explode('correct').melt('correct', ignore_index=False) \
.query('value == correct').groupby(level=0)['variable'] \
.apply(', '.join)
print(df6)
# Output
answer1 answer2 answer3 correct Answer
0 Lo London CA [London, CA] answer2, answer3
1 New York New CA [New York] answer1
2 Toronto Paris CA [Toronto] answer1
After explode_melt
, your dataframe looks like:
>>> df6.explode('correct').melt('correct', ignore_index=False).sort_index()
correct variable value
0 London answer1 Lo
0 CA answer1 Lo
0 London answer2 London
0 CA answer2 London
0 London answer3 CA
0 CA answer3 CA
1 New York answer1 New York
1 New York answer2 New
1 New York answer3 CA
2 Toronto answer1 Toronto
2 Toronto answer2 Paris
2 Toronto answer3 CA
Assuming, the correct
column is a list of string. From your setup of df6
, I used pd.eval
to normalize the column:
df6['correct'] = pd.eval(df6['correct'])
Upvotes: 2
Reputation: 600
This will also work:
import pandas as pd
df6 = pd.DataFrame({
'answer1': ['Lo', 'New York', 'Toronto'],
'answer2': ['London', 'New', 'Paris'],
'answer3': ['CA', 'CA', 'CA'],
'correct': [['London','CA'], ['New York'], ['Toronto']]
})
correct_answer_column = []
for i, row in df6.iterrows():
correct_answer = []
if row['answer1'] in row['correct']:
correct_answer.append('answer1')
if row['answer2'] in row['correct']:
correct_answer.append('answer2')
if row['answer3'] in row['correct']:
correct_answer.append('answer3')
correct_answer_column.append(correct_answer)
correct_answer_column = pd.DataFrame(correct_answer_column)
final = pd.concat([df6, correct_answer_column], axis=1)
print(final)
Upvotes: 1
Reputation: 1030
Try this:
import pandas as pd
df6 = pd.DataFrame({
'answer1': ['Lo', 'New York', 'Toronto'],
'answer2': ['London', 'New', 'Paris'],
'answer3': ['CA', 'CA', 'CA'],
'correct': [["'London'","'CA'"], ["'New York'"], ["'Toronto'"]]
})
cols = df6.filter(like='answer').columns
def find_answers(answers, l):
ans = []
for a, c in zip(answers, cols):
print(l)
if "'" + a + "'" in l:
ans.append(c)
return ', '.join(ans)
df6['Answer'] = df6.apply(lambda x: find_answers([x.answer1, x.answer2, x.answer3], x.correct), axis=1)
print(df6)
Upvotes: 1