wayne
wayne

Reputation: 21

Pandas dataframe merge issue

I am learning python and pandas via Wes McKinney's Python for Data Analysis. One of the examples in Chapter 2 is a merge of MovieLens data on movie_id that is not working. I think the issue is that in ratings the movie_id is an int64 and in movies it is an object. The merge returns an empty data frame.

I have read some of the previous posts on pandas and automatic data type assignment and found the dtype in pandas.io.parsers.read_table documentation but cant get the type to change.

The original code:

mnames = ['movie_id', 'title', 'genres']
movies = pd.read_table('ch02/movielens/movies.dat', sep='::', header=None, names=mnames)

And what my research indicated what should work:

movies = pd.read_table('ch02/movielens/movies.dat', sep='::', header=None, names=mnames, dtype={'movie_id':np.int64})

Unfortunately, the type isn't changed and the merge still returns an empty set. I am running pandas 0.10.1

Upvotes: 2

Views: 2403

Answers (1)

lexual
lexual

Reputation: 48682

(Note I haven't looked up the book code, just your post)

First confirm the dtypes:

print ratings_df.dtypes
print movies_df.dtypes

If you find they're different types you could try (let's assume ratings_df.movie_id is object instead of int):

ratings_df.movie_id = ratings_df.movie_id.astype(int)

See if your merge now works.

Upvotes: 2

Related Questions