Reputation: 20362
I have two dataframes in Azure Databricks. Both are of type: pyspark.sql.dataframe.DataFrame
The number of rows are the same; indexes are the same. I thought one of these code snippets, below, would do the job.
First Attempt:
result = pd.concat([df1, df2], axis=1)
Error Message: TypeError: cannot concatenate object of type "<class 'pyspark.sql.dataframe.DataFrame'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
Second Attempt:
result = pd.merge(df1, df2, left_index=True, right_index=True)
Error Message: TypeError: Can only merge Series or DataFrame objects, a <class 'pyspark.sql.dataframe.DataFrame'> was passed
Upvotes: 0
Views: 5140
Reputation: 1136
I faced similar issue when combining two dataframes of same columns.
df = pd.concat([df, resultant_df], ignore_index=True)
TypeError: cannot concatenate object of type '<class 'pyspark.sql.dataframe.DataFrame'>'; only Series and DataFrame objs are valid
Then I tried join(), but it appends columns multiple times and returns empty dataframe.
df.join(resultant_df)
After that I used union(), gets the exact result.
df = df.union(resultant_df)
df.show()
It works fine in my case.
Upvotes: 2
Reputation: 20362
I ended up converting the two objects to pandas dataframes and then did the merge using the technique I know how to use.
Step #1:
df1= df1.select("*").toPandas()
df2= df2.select("*").toPandas()
Step #2:
result = pd.concat([df1, df2], axis=1)
Done!
Upvotes: 3