Aloysius
Aloysius

Reputation: 33

How do I apply melt / wide_to_long onto a data frame with multiple columns?

I have a one-hot encoded data frame that is similar to the one below:

review_score Action & Adventure Classic Movies TV Comedies TV Mysteries
4 0 0 1 0
10 1 0 0 0
5 1 0 0 0
7 0 1 0 0
8 0 0 1 0
7 0 0 0 1

it contains 40+ columns and I do not think it would be feasible to put every column name into the functions.

How can I pivot or rearrange the table so it looks something like the one below:

Genre review_score
Action & Adventure 10
Action & Adventure 5
Classic Movies 7
TV Comedies 4
TV Comedies 8
TV Mysteries 7

I want each genre to have a row for each review score given in order to plot a boxplot

I have tried melt and wide_to_long functions but cannot get the desired data frame. Please advice!

Upvotes: 2

Views: 159

Answers (4)

phœnix
phœnix

Reputation: 367

hello i hope that help you, i work it with for loop

l=[]
k=[]
for i in df.columns[1:]:
    for j in range(len(df)):
        if df[i][j]==1:
            l.append(df.review_score[j])
            k.append(i)
result= pd.DataFrame()
result['Genre']=k
result['review_score']=l

Out put :

              Genre  review_score
0  Action_Adventure            10
1  Action_Adventure             5
2     ClassicMovies             7
3        TVComedies             4
4        TVComedies             8
5       TVMysteries             7

Upvotes: 0

I'mahdi
I'mahdi

Reputation: 24059

You can use iloc and idxmax(axis=1) to find column_name then use pandas.Series.to_frame() and pandas.DataFrame.assign to add new column to old dataframe.

new_df = df['review_score'].to_frame().assign(Genre = df.iloc[:, 1:].idxmax(axis=1))
print(new_df)

Output:

   review_score               Genre
0             4         TV Comedies
1            10  Action & Adventure
2             5  Action & Adventure
3             7      Classic Movies
4             8         TV Comedies
5             7        TV Mysteries

Upvotes: 1

SomeDude
SomeDude

Reputation: 14238

If you don't have 1.5.0+ version of pandas, you could do:

cols = ['review_score','Genre']
s = df.set_index('review_score').stack()
out = s[s==1].rename_axis(cols).reset_index()[cols]

print(out)

   review_score               Genre
0             4         TV Comedies
1            10  Action & Adventure
2             5  Action & Adventure
3             7      Classic Movies
4             8         TV Comedies
5             7        TV Mysteries

Upvotes: 1

Chrysophylaxs
Chrysophylaxs

Reputation: 6583

In pandas 1.5.0+ there is pd.from_dummies

import pandas as pd

genre = pd.from_dummies(df.drop(columns="review_score"))
pd.concat([genre[""].rename("Genre"), df["review_score"]], axis=1)

alternatively, use pd.DataFrame.idxmax

genre = df.drop(columns="review_score").idxmax(axis=1)
pd.concat([genre.rename("Genre"), df["review_score"]], axis=1)

Upvotes: 2

Related Questions