thecoder
thecoder

Reputation: 247

Merge multiple dataframes outputted via a FOR loop function into one single dataframe

I have a FOR loop function that iterates over a list of tables and columns (zip) to get minimum and maximum values. The output is separated for each of the combination rather than one single dataframe/table. Is there a way to combine the results of FOR loop into one final output within the function?

from pyspark.sql import functions as f

def minmax(tables, cols):
    for table, column in zip(tables, cols):
        minmax = spark.table(table).where(col(column).isNotNull()).select(f.lit(table).alias("table"), f.lit(column).alias("col"), min(col(column)).alias("min"), 
        max(col(column)).alias("max"))
        minmax.show()
tables = ["sales_123", "sales_REW"]
cols = ["costs", "price"]

minmax(tables, cols)

Output from the function:

+---------+-----+---+---+
|    table|  col|min|max|
+---------+-----+---+---+
|sales_123|costs|  0|400|
+---------+-----+---+---+

+----------+-----+---+---+
|     table|  col|min|max|
+----------+-----+---+---+
|sales_REW |price|  0|400|
+----------+-----+---+---+

Desired Output:

+---------+-----+---+---+
|    table|  col|min|max|
+---------+-----+---+---+
|sales_123|costs|  0|400|
|sales_REW|price|  0|400|
+---------+-----+---+---+

Upvotes: 2

Views: 5977

Answers (1)

Shaido
Shaido

Reputation: 28392

Put all the dataframes into a list and do the union after the for-loop:

from functools import reduce
from pyspark.sql import functions as f
from pyspark.sql import DataFrame

def minmax(tables, cols):

    dfs = []        
    for table, column in zip(tables, cols):
        minmax = spark.table(table).where(col(column).isNotNull()).select(f.lit(table).alias("table"), f.lit(column).alias("col"), min(col(column)).alias("min"), max(col(column)).alias("max"))
        dfs.append(minmax)
    df = reduce(DataFrame.union, dfs)

Note that the order of the columns needs to be the same of all involved dataframes (as is the case here). Otherwise this can have unexpected results.

Upvotes: 6

Related Questions