Muhammad Anas Raza
Muhammad Anas Raza

Reputation: 308

How to convert multiple columns in pandas DataFrame into one columns using label encoding

I am building a Movie Recommender using Machine Learning with python.
The csv data is of form:

Movie ID,Movie Name,IMDB Rating,Biography,Drama,Thriller,Comedy,Crime,Mystery,History,Label 58,The Imitation Game,8,1,1,1,0,0,0,0,0 8,Ex Machina,7.7,0,1,0,0,0,1,0,0

Movie_recommendation_data = pd.read_csv('movies.csv', index_col = 'Movie Name')
X = movie_recommendation_data[['Biography','Drama','Thriller','Comedy','Crime','Mystery','History']]
y = movie_recommendation_data['Movie ID']
clf = KNeighborsClassifier(n_neighbors = 5)

How can I convert the bold columns into one column where I can use Label Encoding to easily determine the genres. Any Suggestions?

Upvotes: 0

Views: 364

Answers (1)

UltralordTaha
UltralordTaha

Reputation: 26

MERGED_COLUMN = pd.concat([pd.Series(Movie_recommendation_data.Biography),
 pd.Series(Movie_recommendation_data.Drama),
 pd.Series(Movie_recommendation_data.Thriller), 
 pd.Series(Movie_recommendation_data.Comedy),
 pd.Series(Movie_recommendation_data.Crime),
 pd.Series(Movie_recommendation_data.Mystery),
 pd.Series(Movie_recommendation_data.History)], ignore_index = True)

This will result in a single column which contains all the columns merged into one.

To add corresponding Labels to each Genre. First count the number of Genres. In your case its 7. Then just iterate each Label 7 times in a new column.

Hope this helps. :)

Upvotes: 1

Related Questions