Reputation:
How to creat a pyspark DataFrame inside of a loop? In this loop in each iterate I am printing 2 values print(a1,a2)
. now I want to store all these value in a pyspark
dataframe.
Upvotes: 0
Views: 1537
Reputation: 66
Initially, before the loop, you could create an empty dataframe with your preferred schema. Then, create a new df for each loop with the same schema and union it with your original dataframe. Refer the code below.
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType,StructField, StringType
spark = SparkSession.builder.getOrCreate()
schema = StructType([
StructField('a1', StringType(), True),
StructField('a2', StringType(), True)
])
df = spark.createDataFrame([],schema)
for i in range(1,5):
a1 = i
a2 = i+1
newRow = spark.createDataFrame([(a1,a2)], schema)
df = df.union(newRow)
print(df.show())
This gives me the below result where the values are appended to the df in each loop.
+---+---+
| a1| a2|
+---+---+
| 1| 2|
| 2| 3|
| 3| 4|
| 4| 5|
+---+---+
Upvotes: 1