tipanverella
tipanverella

Reputation: 3667

pandas merge from mysql datatype mismatch issue

Consider the following:

import pandas as pd

df1 = pd.io.sql.read_frame("select c1,c2,c3 from table1;",mysqldb1)
df2 = pd.io.sql.read_frame("select c1,c4 from table1;",mysqldb2)
df =  pd.merge(df1,df2,on='c1',how='inner')

This works most of the time. I believe I am having a problem because in mysqldb1 c1 is a varchar whereas in mysqldb2 c1 is an INTEGER. I can't change this state of affairs. But now, because

df1.c1.dtype != df2.c1.dtype

pd.merge returns an empty dataframe.

I have tried casting, i.e.

df2['c1'] = df2.c1.astype(object) 

and then tried the merge, to no avail.

Upvotes: 0

Views: 179

Answers (1)

cwharland
cwharland

Reputation: 6703

Usually in cases of SQL reads to a dataframe the varchar columns are read as raw strings but pandas lists all string columns as "object."

The quick fix in this situation is to convert the c1 column to string.

df['c1'] = df.c1.astype(string)

This will cast all of your int c1 values to string.

A deeper fix, if you think all c2 values should be cast to integers, would be to write a function that goes row by row casting as int and explicitly deals with c2 values that have difficulties casting. But it's not entirely needed.

Upvotes: 1

Related Questions