Reputation: 109
I have a set of survey data that contain > 100 columns and most of them are duplicated column name with different Value
My objective is to create a code to group all column that have the same column name automatically no matter how many column inside my file as per sample below :
Ive tried ffill however I cant find a way to make sure that my ffill stop if the column name are different Can anybody please teach me on how to do this?
Thank You Best Regards Railey Shahril
Upvotes: 1
Views: 60
Reputation: 862721
If possible multiple values per groups and need only last non misisng values use:
Idea is grouping by duplicated columns names, forward filling missing values and select last column per groups in lambda function:
df = df.groupby(level=0, axis=1).apply(lambda x: x.ffill(axis=1).iloc[:, -1])
If there is only one non missing row per group and need last one:
df = df.groupby(level=0, axis=1).last()
Upvotes: 1