Trying to Merge or Concat two pyspark.sql.dataframe.DataFrame in Databricks Environment

Question

I have two dataframes in Azure Databricks. Both are of type: pyspark.sql.dataframe.DataFrame

The number of rows are the same; indexes are the same. I thought one of these code snippets, below, would do the job.

First Attempt:

result = pd.concat([df1, df2], axis=1)


Error Message: TypeError: cannot concatenate object of type ""; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

Second Attempt:

result = pd.merge(df1, df2, left_index=True, right_index=True)

Error Message:  TypeError: Can only merge Series or DataFrame objects, a  was passed

ASH · Accepted Answer

I ended up converting the two objects to pandas dataframes and then did the merge using the technique I know how to use.

Step #1:

df1= df1.select("*").toPandas()
df2= df2.select("*").toPandas()

Step #2:

result = pd.concat([df1, df2], axis=1)

Done!

Trying to Merge or Concat two pyspark.sql.dataframe.DataFrame in Databricks Environment

Answers (2)

Related Questions