Reputation: 9
We have student anwser MCQs after each lessons on socrative They enter their name first, then anwser. For each lesson, we collect data from the Socrative platform but have issues "normalizing the names" such as 'John Doe', johndoe' or John,Doe' can be transformed into 'doe', as it is written is our main file.
Our main file for following up students (treated as a dataframe with python) has initially just 1 column, the name (as a string 'doe' for Mr. John Doe).
I'l like to write a function that goes through the 'name' column of my lesson1 dataframe and for each value of the name column, replace the badly typed name by the reference name.
To lower the case, suppress excessive spaces and suppress excessive punctuation, i've used the following code
lesson1["name"] = lesson1["name"].str.lower()
lesson1["name"] = lesson1["name"].str.strip()
import re
lesson1["name"]=lesson1["name"].apply(lambda x : re.sub('[^A-Za-z0-9]+', '', x))
Then I want to change the 'name' values for the reference name is necessary I've tried the following code on 2 lists
bad=lesson1['name']
good=reference['name']
def changenames(lesson_list, reference_list):
for i,name in enumerate(lesson_list):
for j,ref in enumerate(reference_list):
if ref in name:
lesson_list[i]=ref
changenames(bad,good)
but 1/ it's not working due to SettingWithCopyWarning 2/ i fail to apply it to a column of the dataframe
Could you help me ? Thx L.
Upvotes: 0
Views: 59
Reputation: 9
I've found out a way
I've 2 dataframes - the reference_list dataframe, with the names of the students. It has a column 'name' - the lesson dataframe with the names as the students type them when they answer the MCQs (not standardized) and the answers to the MCQs.
To transform the names of the students in the lesson dataframe, based on the well-types names in reference_list['name'], i have used :
for i in lesson['name']:
for ref in reference_list['name']:
if ref in i:
lesson.loc[lesson['name'] == i, 'name']=ref
and it works fine, After that, you can apply functions to treat duplicates, merge data...
I've found help in this thread Replace single value in a pandas dataframe, when index is not known and values in column are unique
Hope it'll help some of you. Louis
Upvotes: 0