Reputation: 21254
I have a dataframe which I read in using pyspark
with:
df1 = spark.read.csv("/user/me/data/*").toPandas()
Unfortunately, pyspark leaves all the types as Object
, even numerical values. I need to merge this with another dataframe I read in with df2 = pd.read_csv("file.csv")
so I need the types in df1
to be inferred exactly as pandas would have done it.
How can you infer types of an existing pandas dataframe?
Upvotes: 3
Views: 3679
Reputation: 294218
If you have the same column names you could use pd.DataFrame.astype
:
df1 = df1.astype(df2.dtypes)
Otherwise, you need to construct a dictionary where keys are the column names in df1
and the values are dtypes
. You can start with d = df2.dtypes.to_dict()
to see what it should look like. Then construct a new dictionary altering the keys where needed.
Once you've constructed the dictionary d
, use:
df1 = df1.astype(d)
Upvotes: 4