Combing multiple columns into single column with name of the column as value

Question

I have a data set that I would like to reshape part of the results. The data set always starts with the first few columns and is followed by a variable number of columns that group the data. If the key belongs to that group, it will be marked by an x. Each key will not belong to more than one group. The data structure is like this:

Key  Date Added Group1Name Group2Name Group3Name ... GroupXName
1    1/1/2018   x
2    1/1/2018               x
3    1/1/2018                          x
4    1/1/2018   x 
5    1/1/2018                                         x

I want to reformat as:

Key  Date Added Group
1    1/1/2018   Group1Name
2    1/1/2018   Group2Name           
3    1/1/2018   Group3Name     
4    1/1/2018   Group1Name
5    1/1/2018   GroupXName

Sven Harris · Accepted Answer

We can first convert this into an easier to use binary format:

group_cols = df.columns.difference(["Key", "Date Added"])
df[group_cols] = df[group_cols].replace({"":0, "x":1})

Then get max id in axis 1 i.e. the column in which we have the highest value, which will be 1

df["Group"] = df[group_cols].idxmax(axis=1)

And drop the original columns

df = df.drop(group_cols, axis=1)

Combing multiple columns into single column with name of the column as value

Answers (1)

Related Questions