Reputation: 308
I am building a Movie Recommender using Machine Learning with python.
The csv data is of form:
Movie ID,Movie Name,IMDB Rating,Biography,Drama,Thriller,Comedy,Crime,Mystery,History,Label 58,The Imitation Game,8,1,1,1,0,0,0,0,0 8,Ex Machina,7.7,0,1,0,0,0,1,0,0
Movie_recommendation_data = pd.read_csv('movies.csv', index_col = 'Movie Name')
X = movie_recommendation_data[['Biography','Drama','Thriller','Comedy','Crime','Mystery','History']]
y = movie_recommendation_data['Movie ID']
clf = KNeighborsClassifier(n_neighbors = 5)
How can I convert the bold columns into one column where I can use Label Encoding to easily determine the genres. Any Suggestions?
Upvotes: 0
Views: 364
Reputation: 26
MERGED_COLUMN = pd.concat([pd.Series(Movie_recommendation_data.Biography),
pd.Series(Movie_recommendation_data.Drama),
pd.Series(Movie_recommendation_data.Thriller),
pd.Series(Movie_recommendation_data.Comedy),
pd.Series(Movie_recommendation_data.Crime),
pd.Series(Movie_recommendation_data.Mystery),
pd.Series(Movie_recommendation_data.History)], ignore_index = True)
This will result in a single column which contains all the columns merged into one.
To add corresponding Labels to each Genre. First count the number of Genres. In your case its 7. Then just iterate each Label 7 times in a new column.
Hope this helps. :)
Upvotes: 1