S Coe
S Coe

Reputation: 85

Merging two pandas dataframes

I've seen this question a few times, but the answers are not working for me. I have two dataframes, split_df and csv_df that I;m trying to merge on a column that is called key in each of them.

Here's the top 5 items from split_df

        key        tags
2785  30000        ipv6
2785  30000  networking
2785  30000    wireless
2785  30000   standards
2785  30000      satcom

Here's the top 5 items from the csv_df

     key
0  30000
1  30002
2  31288
3  33630
4  31663

When I run this merge statement:

common_df = pd.merge(split_df,csv_df, on=['key'], suffixes=('_left', '_right'))

I get:

Empty DataFrame
Columns: [key, tags]
Index: []

Process finished with exit code 0

I can't figure out why the merge isn't working for me. Any suggestions?

I have indexes on both (and have run the merge with them out as well)

csv_df.set_index('key')
split_df.set_index('key')

I am very new to Pyton and Pandas

Thanks!

Upvotes: 3

Views: 111

Answers (2)

piRSquared
piRSquared

Reputation: 294536

The key column in one of your dataframes is int while the other is str

split_df.key = split_df.key.astype(int)
csv_df.key = csv_df.key.astype(int)

pd.merge(split_df,csv_df, on=['key'], suffixes=('_left', '_right'))

Upvotes: 3

Andy Hayden
Andy Hayden

Reputation: 375915

From the above code this works for me:

In [11]: pd.merge(split_df,csv_df, on=['key'], suffixes=('_left', '_right'))
Out[11]:
     key        tags
0  30000        ipv6
1  30000  networking
2  30000    wireless
3  30000   standards
4  30000      satcom

My suspicion is that the column type key is different in your dataframes (and the same int in mine).

Most likely one of them is a string (and the other a float/int).

In [12]: split_df["key"] = split_df["key"].apply(str)

In [13]: pd.merge(split_df,csv_df, on=['key'], suffixes=('_left', '_right'))
Out[13]:
Empty DataFrame
Columns: [key, tags]
Index: []

If they're both strings:

In [14]: csv_df["key"] = csv_df["key"].apply(str)

In [15]: pd.merge(split_df,csv_df, on=['key'], suffixes=('_left', '_right'))
Out[15]:
     key        tags
0  30000        ipv6
1  30000  networking
2  30000    wireless
3  30000   standards
4  30000      satcom

Upvotes: 3

Related Questions