Reputation: 21
I am learning python and pandas via Wes McKinney's Python for Data Analysis. One of the examples in Chapter 2 is a merge of MovieLens data on movie_id that is not working. I think the issue is that in ratings the movie_id is an int64 and in movies it is an object. The merge returns an empty data frame.
I have read some of the previous posts on pandas and automatic data type assignment and found the dtype in pandas.io.parsers.read_table documentation but cant get the type to change.
The original code:
mnames = ['movie_id', 'title', 'genres']
movies = pd.read_table('ch02/movielens/movies.dat', sep='::', header=None, names=mnames)
And what my research indicated what should work:
movies = pd.read_table('ch02/movielens/movies.dat', sep='::', header=None, names=mnames, dtype={'movie_id':np.int64})
Unfortunately, the type isn't changed and the merge still returns an empty set. I am running pandas 0.10.1
Upvotes: 2
Views: 2403
Reputation: 48682
(Note I haven't looked up the book code, just your post)
First confirm the dtypes:
print ratings_df.dtypes
print movies_df.dtypes
If you find they're different types you could try (let's assume ratings_df.movie_id is object instead of int):
ratings_df.movie_id = ratings_df.movie_id.astype(int)
See if your merge now works.
Upvotes: 2