Mus
Mus

Reputation: 7540

Empty DataFrame in pandas after merge

I am following examples in Python for Data Analysis by Wes McKinney, and keep coming across a problem: once I merge the DataFrames that I have created, the merged DF is showing as an empty DF, even though the component DFs are showing as being populated.

The code is:

data = pd.merge(pd.merge(ratings, users), movies)

Then when I check data, it shows as empty:

Empty DataFrame
Columns: [user_id, rating, timestamp, gender, age, occupation, zip, movie_id, title, genres]
Index: []

When I check the component DFs, they all contain data:

In [65]: len(ratings)
Out[65]: 1000209

In [66]: len(movies)
Out[66]: 3883

In [67]: len(users)
Out[67]: 6040

Why is this happening and how can I fix it?

Upvotes: 0

Views: 1957

Answers (1)

pantry_cat
pantry_cat

Reputation: 11

I had a similar issue when trying to merge on 2 indexes. I can't tell without more code if that's your issue i.e are you sure you commented out any earlier code that may be setting the indexes of both dfs to something that can't match... As mentioned in the comments, I also tried to solve it by making sure my matching columns were the same dtype (see code below). Both were already object types, so it can't have been the issue.

print(df_movies.dtypes)

print(df_ratings.dtypes)

Then I tried setting both dataframes indexes to the same columns I was matching on. This worked for me:

df_movies = df_movies.set_index('movie_id')
        
df_ratings = df_ratings.set_index('movie_id')
    
df_mov_rating = pd.merge(df_movies, df_ratings, left_index=True, right_index=True)

Upvotes: 1

Related Questions