Reputation: 2112
I have 4 dataframes which only have one row and one column, and I would like to combine them into one dataframe. In python i would do this using the zip function but I need a way to do it in pyspark. Any suggestions?
Dataframes look like this:
+--------------------------+
|sum(sum(parcelUBLD_SQ_FT))|
+--------------------------+
| 1.13014806E8|
+--------------------------+
+---------------------+
|sum(parcelUBLD_SQ_FT)|
+---------------------+
| 1.13014806E8|
+---------------------+
+---------------+
|count(parcelID)|
+---------------+
| 45932|
+---------------+
+----------------+
|sum(parcelCount)|
+----------------+
| 45932|
+----------------+
and I would like it to look like this:
+--------------------------+---------------------+---------------+----------------+
|sum(sum(parcelUBLD_SQ_FT))|sum(parcelUBLD_SQ_FT)|count(parcelID)|sum(parcelCount)|
+--------------------------+---------------------+---------------+----------------+
| 1.13014806E8| 1.13014806E8| 45932| 45932|
+--------------------------+---------------------+---------------+----------------+
Upvotes: 0
Views: 110
Reputation: 1932
Since, you clearly specified all dataframes are having one row, you can use cross join to get the desired output
df1.crossJoin(df2).crossJoin(df3).crossJoin(df4)
Upvotes: 1