Reputation: 57
example: I have a pyspark dataframe as:
df=
x_data y_data
2.5 1.5
3.5 8.5
4.5 89.5
5.5 20.5
Let's say have some calculation to be done on each column on df which I do inside a for loop. After that my final output should be like this:
df_output=
cal_1 cal_2 Cal_3 Cal_4 Datatype
23 24 34 36 x_data
12 13 18 90 x_data
23 54 74 96 x_data
41 13 38 50 x_data
53 74 44 6 y_data
72 23 28 50 y_data
43 24 44 66 y_data
41 23 58 30 y_data
How do I append these results calculated on each column into the same pyspark output data frame inside the for loop?
Upvotes: 0
Views: 6475
Reputation: 32640
You can use functools.reduce
to union the list of dataframes created in each iteration.
Something like this :
import functools
from pyspark.sql import DataFrame
output_dfs = []
for c in df.columns:
# do some calculation
df_output = _ # calculation result
output_dfs.append(df_output)
df_output = functools.reduce(DataFrame.union, output_dfs)
Upvotes: 2