Reputation: 81
how to create dynamic dataframe name in pyspark here I am not able to create new dataframe using below code it will give me only last dataframe name, I need All dataframe name
for prime2 in pdf2:
ol2 = Bucket_path + prime2['S3_File_with_Path']
t = 1
sd = {}
testR = "df" + str(t)
print("testR",testR)
sd[testR] = spark.read.format("parquet").load(ol2).cache()
t = t + 1
Upvotes: 0
Views: 3778
Reputation: 1859
Seems like you're creating dict inside the loop, so getting a dict with only one (last) entry. Try changing code to something like this:
sd = {}
for prime2 in pdf2:
ol2 = Bucket_path + prime2['S3_File_with_Path']
t = 1
testR = "df" + str(t)
print("testR",testR)
df = spark.read.format("parquet").load(ol2).cache()
sd[testR] = df
t = t + 1
# sd dict is available here, all the dataframes are inside
print(len(sd))
Upvotes: 2