Merge multiple dataframes outputted via a FOR loop function into one single dataframe

Question

I have a FOR loop function that iterates over a list of tables and columns (zip) to get minimum and maximum values. The output is separated for each of the combination rather than one single dataframe/table. Is there a way to combine the results of FOR loop into one final output within the function?

from pyspark.sql import functions as f

def minmax(tables, cols):
    for table, column in zip(tables, cols):
        minmax = spark.table(table).where(col(column).isNotNull()).select(f.lit(table).alias("table"), f.lit(column).alias("col"), min(col(column)).alias("min"), 
        max(col(column)).alias("max"))
        minmax.show()

tables = ["sales_123", "sales_REW"]
cols = ["costs", "price"]

minmax(tables, cols)

Output from the function:

+---------+-----+---+---+
|    table|  col|min|max|
+---------+-----+---+---+
|sales_123|costs|  0|400|
+---------+-----+---+---+

+----------+-----+---+---+
|     table|  col|min|max|
+----------+-----+---+---+
|sales_REW |price|  0|400|
+----------+-----+---+---+

Desired Output:

+---------+-----+---+---+
|    table|  col|min|max|
+---------+-----+---+---+
|sales_123|costs|  0|400|
|sales_REW|price|  0|400|
+---------+-----+---+---+

Shaido · Accepted Answer

Put all the dataframes into a list and do the union after the for-loop:

from functools import reduce
from pyspark.sql import functions as f
from pyspark.sql import DataFrame

def minmax(tables, cols):

    dfs = []        
    for table, column in zip(tables, cols):
        minmax = spark.table(table).where(col(column).isNotNull()).select(f.lit(table).alias("table"), f.lit(column).alias("col"), min(col(column)).alias("min"), max(col(column)).alias("max"))
        dfs.append(minmax)
    df = reduce(DataFrame.union, dfs)

Note that the order of the columns needs to be the same of all involved dataframes (as is the case here). Otherwise this can have unexpected results.

Merge multiple dataframes outputted via a FOR loop function into one single dataframe

Answers (1)

Related Questions